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GENE CLUSTER FOR PRODUCTION OF THE ENEDIYNE 
ANTITUMOR ANTIBIOTIC C-1027 

CROSS-REFERENCE TO RELATED APPLICATIONS 

This application claims benefit under 35 U.S.C. §1 19 of provisional 
5 application USSN 60/1 15,434, filed on January 6, 1999, which is herein incorporated by 
reference in its entirety for all purposes. 

STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY 
SPONSORED RESEARCH AND DEVELOPMENT 

This work was supported in part by a grant from the Cancer Research 
10 Coordinating Committee, University of California, the National Institutes of Health grant 
CA78747, and the Searle Scholars Program/The Chicago Community Trust. The 
Government of the United States of America may have certain rights in this invention. 

FIELD OF THE INVENTION 

This invention relates to the field of enediyne antibiotics. In particular this 
15 invention elucidates the gene cluster controlling the biosynthesis of the C-1027 enediyne. 

BACKGROUND OF THE INVENTION 

The enediyne antibiotics are currently the focus of intense research activity in 
the fields of chemistry, biology, and medical sciences, because of their unique molecular 
architecture, biological activities, and modes of actions (Doyle and Borders (1995) Enediyne 

20 antibiotics as antitumor agents. Marcel-Dekker, New York, Thorson et al. (1999) Bioorg. 
Chem., 27: 172-188). Since the unveiling of the structure of neocarzinostatin chromophore 
(Edo et al. (1985) Tetrahedron Lett. 26: 331-340) in 1985, the enediyne family has grown 
steadily. Thus far, there have been three basic groups within the enediyne antibiotic family: 
(a) the calicheamicin/esperamicin type, which includes the calicheamicins, the esperamicins, 

25 and namenamicin, (b) the dynemicin type, and (c) the chromoprotein type, consisting of an 
apoprotein and an unstable enediyne chromophore. The latter group includes 
neocarzinostatin, kedarcidin, C-1027 (Fig. 1), and maduropeptin, whose enediyne 
chromophore structures have been established, as well as several others whose enediyne 
chromophore structures are yet to be determined due to their instability (Thorson et al. 
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(1999) Bioorg. Chem., 27: 172-188). N1999A2, in contrast to the other chromoproteins, 
exists as an enediyne chromophore alone despite the fact that its structure is very similar to 
the other chromoprotein chromophore (Ando et a/.(1998) Tetra. Letts., 39: 6495-6480). 

As a family, the enediyne antibiotics are the most potent, highly active 
5 antitumor agents ever discovered. Some members are 1 000 times more potent than 
adriamycin, one of the most effective, clinically used antitumor antibiotics (Zhen et al 
(1989) J, Antibiot. 42: 1294-1298). All members of this family contain a unit consisting of 
two acetylenic groups conjugated to a double bond or incipient double bond within a nine or 
ten-membered ring; i.e., the enediyne core as exemplified by C-1027 in Fig. 1. As the 

10 consequence of this structural feature, these compounds share a common mechanism of 
action: the enediyne core undergoes an electronic rearrangement to form a transient 
benzenoid diradical, which is positioned in the minor groove of DNA so as to damage DNA 
by abstracting hydrogen atoms from deoxyriboses on both strands (Fig. 1). Reaction of the 
resulting deoxyribose carbon-centered radicals with molecular oxygen initiates a process that 

15 results in both single-strand and double-strand DNA cleavages (Doyle and Borders (1995) 
Enediyne antibiotics as antitumor agents. Marcel-Dekker, New York; Ikemoton et al (1995) 
Proc. Natl Acad. Set USA 92:10506-10510; Myers et al (1997) J. Am. Chem. Soc. 1 19: 
2965-2972; Stassinopoulos et al (1996) Science 272: 1943-1946; Thorson et al (1999) 
Bioorg. Chem., 27: 172-188; Xu et al. (1997) J. Am. Chem. Soc. 119: 1133-1134). This 

20 novel mechanism of DNA damage has important implications for their application as potent 
cancer chemo therapeutic agents (Doyle and Borders (1995) supra.; Sievers et al (1999) 
Blood 93:3678-3684). 

As an alternative to making structural analogs of microbial metabolites by 
chemical synthesis, manipulations of genes governing secondary metabolism offer a 

25 promising alternative allowing preparation of these compounds biosynthetically (Cane et al 
(1998) Science 282: 63-68; Hutchinson and Fujii. (1995) Ann. Rev. Microbiol 49: 201-38; 
Katz and Donadio (1993) Ann. Rev. Microbiol. 47: 875-912). The success of the latter 
approach depends critically on the availability of novel genetic systems and on genes 
encoding novel enzyme activities. The enediynes offer a distinct opportunity to study the 

30 biosynthesis of their unique molecular scaffolds and the mechanism of self-resistance to 
extremely cytotoxic natural products. Elucidation of these aspects provides access to 
rational engineering of enediyne biosynthesis for novel drug leads and makes it possible to 
construct enediyne overproducing strains by de-regulating the biosynthetic machinery. In 




addition, elucidation of an enediyne gene cluster contributes to the general field of 
combinatorial biosynthesis by expanding the repertoire of novel polyketide synthase (PKS) 
and deoxysugar biosynthesis genes as well as other genes uniquely associated with enediyne 
biosynthesis, leading to the making of novel enediynes via combinatorial biosynthesis. 

5 SUMMARY OF THE INVENTION 

This invention provides nucleic acid sequences and characterization of the 
gene cluster responsible for the biosynthesis of the enediyne C-1027 (produced by 
Streptomyces globisporus). In particular structural and functional characterization is 
provided for the 50 open reading frames (ORFs) comprising this gene cluster. Thus, in one 

10 embodiment, this invention provides an isolated nucleic acid comprising a nucleic acid 
selected from the group consisting of a nucleic acid encoding any of C-1027 open reading 
frames (ORFs) -7 through 42, excluding ORF 9 (cagA), a nucleic acid encoding a 
polypeptide encoded by any of C-1027 open reading frames (ORFs) -7 through 42, excluding 
ORF 9 (cagA); and a nucleic acid amplified by polymerase chain reaction (PCR) using 

15 primer pairs that amplify any of C-1027 open reading frames (ORFs) -7 through 42, 

excluding ORF 9 (cagA). In one embodiment, preferred nucleic acids comprise a nucleic 
acid encoding at least two (more preferably at least three or more) open reading frames 
(ORFs) selected from the group consisting of ORF-1 through ORF 42, excluding ORF 9 
(cagA). 

20 In another embodiment this invention provides an isolated nucleic acid 

comprising a nucleic acid that specifically hybridizes under stringent conditions to an open 
reading frame (ORF) of the C-1027 biosynthesis gene cluster, excluding ORF 9 (cagA), and 
can substitute for the ORF to which it specifically hybridizes to direct the synthesis of an 
enediyne. In certain embodiments this also includes nucleic acids that would stringently 

25 hybridizes indicated above, but for, the degeneracy of the nucleic acid code. In other words, 
if silent mutations could be made in the subject sequence so that it hybridizes to he indicated 
sequence(s) under stringent conditions, it would be included in certain embodiments. 
Particularly preferred nucleic acids comprises a nucleic acid that specifically hybridizes 
under stringent conditions to a nucleic acid selected from the group consisting of ORF -7, 

30 ORF -6, ORF -5, ORF -4, ORF -3, ORF -2, ORF -1, ORF 0, ORF 1, ORF 2, ORF 3, ORF 4, 
ORF 5, ORF 6, ORF 7, ORF 8, ORF 10, ORF 11, ORF 12, ORF 13, ORF 14, ORF 15, ORF 
16, ORF 17, ORF 18, ORF 19, ORF 20, ORF 21, ORF 22, ORF 23, ORF 24, ORF 25, ORF 




26, ORF 27, ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33, ORF 34, ORF 35, ORF 
36, ORF 37, ORF 38, ORF 39, ORF 40, ORF 41, and ORF 42. Particularly preferred 
isolated nucleic acid comprises a nucleic acid selected from the group consisting of ORF -7, 
ORF -6, ORF -5, ORF -4, ORF -3, ORF -2, ORF -1, ORF 0, ORF 1, ORF 2, ORF 3, ORF 4, 
5 ORF 5, ORF 6, ORF 7, ORF 8, ORF 10, ORF 11, ORF 12, ORF 13, ORF 14, ORF 15, ORF 
16, ORF 17, ORF 18, ORF 19, ORF 20, ORF 21, ORF 22, ORF 23, ORF 24, ORF 25, ORF 
26, ORF 27, ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33, ORF 34, ORF 35, ORF 
36, ORF 37, ORF 38, ORF 39, ORF 40, ORF 41, and ORF 42. The nucleic acid may 
comprises a nucleic acid that is a single nucleotide polymorphism (SNP) of a nucleic acid 

10 selected from the group consisting of ORF -7, ORF -6, ORF -5, ORF -4, ORF -3, ORF -2, 
ORF -1, ORF 0, ORF 1, ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 81, ORF 1, 
ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 8, ORF 10, ORF 1 1, ORF 12, ORF 13, 
ORF 14, ORF 15, ORF 16, ORF 17, ORF 18, ORF 19, ORF 20, ORF 21, ORF 22, ORF 23, 
ORF 24, ORF 25, ORF 26, ORF 27, ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33, 

15 ORF 34, ORF 35, ORF 36, ORF 37, ORF 38, ORF 39, ORF 40, ORF 41, and ORF 42. 

This invention also provides an isolated gene cluster comprising open reading 
frames encoding polypeptides sufficient to direct the assembly of a C-1027 enediyne or a C- 
1027 enediyne analogue. The gene cluster may be present in a cell, more preferably in a 
bacterial cell (e.g. Actinornycetes, Actinoplanetes, Actinomadura, Micromonospora, or 

20 Streptomycetes). Particular preferred bacterial cells include, but are not limited to 

Streptomyces globisporus, Streptomyces lividans, Streptomyces coelicolor, Micromonospora 
echinospora spp. calichenisis, Actinomadura verrucosopora, Micromonospora chersina, 
Streptomyces carzinostaticus, and Actinomycete L585-6. The gene cluster may contain one 
or more open reading frames is operatively linked to a heterologous promoter (e.g. a 

25 constitutive or an inducible promoter). 

This invention also provides for an polypeptide encoded by any one or more 
of the nucleic acids described herein. 

Also provided are host cell(s) (e.g. eukaryotic cells or bacterial cells as 
described herein) transformed with one or more of the expression vectors described herein. 

30 Preferred host cells are transformed with an exogenous nucleic acid comprising a gene 

cluster encoding polypeptides sufficient to direct the assembly of a C-1027 enediyne or a C- 
1027 enediyne analogue. In certain embodiments, heterologous nucleic acid may comprise 
only a portion of the gene cluster, but the cell will still be able to express an enediyne. 




This invention also provides methods of chemically modifying a biological 
molecule. The methods involve contacting a biological molecule that is a substrate for a 
polypeptide encoded by a C-1027 biosynthesis gene cluster open reading frame, with a 
polypeptide encoded by a C-1027 biosynthesis gene cluster open reading frame whereby the 
5 polypeptide chemically modifies the biological molecule. In one preferred embodiment, the 
polypeptide is an enzyme selected from the group consisting of a hydroxylase, a 
homocysteine synthase, a dNDP-glucose dehydrogenase, a citrate carrier protein, a C-methyl 
transferase, an N-methyl transferase, an aminotransferase, a CagA apoprotein, an NDP- 
glucose synthase, an epimerase, an acyl transferase, a coenzyme F390 synthase, and 

10 epoxidase hydrolase, an anthranilate synthase, a glycosyl transferase, a monooxygenase, a 
type II condensation protein, an aminomutase, a type II adenylation protein, an O-methyl 
transferase, a P-450 hydroxylase, an oxidoreductase, and a proline oxidase. In a preferred 
embodiment the method involves contacting the biological molecule with at least two 
(preferably at least three or more) different polypeptides encoded by C-1027 biosynthesis 

15 gene cluster open reading frames. The contacting may be in a host cell (e.g. a eukaryotic cell 
or a bacterial cell) or the contacting can be ex vivo. The biological molecule can be an 
endogenous metabolite produced by said host cell or an exogenous supplied metabolite. In 
preferred embodiments, the host cell is a bacterial cell or eukaryotic cell (e.g., a mammalian 
cell, a yeast cell, a plant cell, a fungal cell, an insect cell, etc.). In certain preferred 

20 embodiments, the host cell synthesizes sugars and glycosylates the biological molecule. In 
other preferred embodiments, the host cell synthesizes deoxysugars. The method can further 
involve contacting the biological molecule with a polyketide synthase or a non-ribosomal 
polypeptide synthetase. The contacting can be in a cell (e.g., a bacterial cell) or ex vivo. In 
one preferred embodiment the method comprises contacting the biological molecule with at 

25 substantially all of the polypeptides encoded by C-1027 biosynthesis gene cluster open 
reading frames and said method produces an enediyne or enediyne analogue. In another 
preferred embodiment, the biological molecule is a fatty acid and the biological molecule is 
contacted with a C-1027 orf polyeptide selected from the group consisting of an epoxide 
hydrase, a monooxygenase, an iron-sulfer flavoprotein, a p-450 hydroxylase, an 

30 oxidoreductase, and a proline oxidase. In certain embodiments, the biological molecule is a 
fatty acid and said biological molecule is contacted with a plurality of C-1027 orf 
polypeptides comprising an epoxide hydrase, a monooxygenase, an iron-sulfer flavoprotein, 
a p-450 hydroxylase, an oxidoreductase, and a proline oxidase. In one especially preferred 




embodiment ,the biological molecule is contacted with polypeptides encoded by ORF 17, 
ORF20, ORF21, ORF29, ORF30, ORF32, ORF35, and ORF38, In another especially 
preferred embodiment, the biological molecule is contacted with polypeptides encoded by 
ORF 15, ORF 16, ORF 28, ORF3, ORF 14, and ORF 13, and, in certain embodiments, ORF 
5 4 and ORF 3 as well. 

In certain embodiments, the method may comprise contacting a sugar with 
one or more C-1027 open reading frame polypeptides selected from the group consisting of a 
dNDP-glucose synthase, a dNDP glucose dehydratase, an epimerase, an aminotransferase, a 
C-methyltransferase, an N-methyltransferase, and a glycosyl transferase. Particularly 

10 preferred variant of this method comprise contacting a dNDP-glucose with a plurality of C- 
1027 open reading frame polypeptides comprising a dNDP-glucose synthase, a dNDP 
glucose dehydratase, an epimerase, an aminotransferase, a C-methyltransferase, an N- 
methyltransferase, and a glycosyl transferase. 

In certain other embodiments, the method comprises contacting an amino acid 

15 with one or one or more C-1027 open reading frame polypeptides selected from the group 

consisting of a hydroxylase, an aminomutase, a type II NRPS condensation enzyme, a type II 
NRPS adenylation enzyme, and a type II peptidyl carrier protein. These methods may 
involve contacting an amino acid with a plurality of C-1027 open reading frame polypeptides 
comprising a hydroxylase, a halogenase, an aminomutase, a type II NRPS condensation 

20 enzyme, a type II NRPS adenylation enzyme, and a type II peptidyl carrier protein. In 
particularly preferred embodiments, the amino acid is a tyrosine. 

This invention also provides a method of synthesizing a chromaprotein type 
enediyne core, said method comprising contacting a fatty acid with one or more C-1027 orf 
polypeptides selected from the group consisting of an epoxide hydrase, a monooxygenase, an 

25 iron-sulfer flavoprotein, a p-450 hydroxylase, an oxidoreductase, and a proline oxidase. In 
preferred embodiments, the fatty acid may be contacted with a plurality of C-1027 orf 
polypeptides comprising an epoxide hydrase, a monooxygenase, an iron-sulfer flavoprotein, 
a p-450 hydroxylase, an oxidoreductase, and a proline oxidase. In particularly preferred 
embodiments, the fatty acid is contacted with polypeptides encoded by ORF 17, ORF20, 

30 ORF21, ORF29, ORF30, ORF32, ORF35, and ORF38. 

In still yet another embodiment, this invention provides a method of 
synthesizing a deoxysugar. This method involves contacting a sugar with one or more C- 
1027 open reading frame polypeptides selected from the group consisting of a dNDP-glucose 




synthase, a dNDP glucose dehydratase, an epimerase, an aminotransferase, a C- 
methyl transferase, an N-methyltransferase, and a glycosyl transferase. In preferred 
embodiments, this method involves contacting a dNDP-glucose with a plurality of C-1027 
open reading frame polypeptides comprising a dNDP-glucose synthase, a dNDP glucose 
5 dehydratase, an epimerase, an aminotransferase, a C-methyltransferase, an N- 

methyltransferase, and a glycosyl transferase. In particularly preferred embodiments, the 
dNDP-glucose is contacted with polypeptides encoded by ORF17, ORF20, ORF21, ORF29, 
ORF30, ORP32, ORF35, and ORF38. 

This invention also provides methods of synthesizing a beta amino acid by 

10 contacting an amino acid with one or one or more C-1027 open reading frame polypeptides 
selected from the group consisting of a hydroxylase, an aminomutase, a type II NRPS 
condensation enzyme, a type II NRPS adenylation enzyme, and a type II peptidyl carrier 
protein. The method preferably comprises contacting an amino acid with a plurality of C- 
1027 open reading frame polypeptides comprising a hydroxylase, a halogenase, an 

15 aminomutase, a type II NRPS condensation enzyme, a type II NRPS adenylation enzyme, 
and a type II peptidyl carrier protein. Particularly preferred embodiments comprise 
contacting the amino acid (e.g. tyrosine) with polypeptides encoded by ORF 4, ORF1 1, 
ORF24, ORF23, ORF25, and ORF26. 

Also provided are methods of synthesizing an enediyne or an enediyne 

20 analogue. These methods involve culturing a cell (e.g. a eukaryotic cell or a bacterium) 
comprising a recombinantly modified C-1027 gene cluster under conditions whereby said 
cell expresses said enediyne or enediyne analogue; and recovering the enediyne or enediyne 
analogue. In preferred embodiments, the gene cluster is present in a bacterium (e.g., 
Actinomycetes, Actinoplanetes, Actinomadura, Micromonospora^ or Streptomycetes). 

25 Particularly preferred bacteria include, but are not limited to Streptomyces globisporus, 
Streptomyces lividans, Streptomyces coelicolor, Micromonospora echinospora spp. 
calichenisis, Actinomadura verrucosopora, Micromonospora chersina, Streptomyces 
carzinostaticus, and Actinomycete L585-6. In another preferred embodiment, the gene 
cluster is present in a eukaryotic cell (e.g. a mammalian cell, a yeast cell, a plant cell, a 

30 fungal cell, an insect cell, etc.). The host cell can be one that synthesizes sugars and 
glycosylates the enediyne or enediyne analogue. The host can be one that synthesizes 
deoxy sugars. 
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This invention also provides a method of making a cell (e.g., a bacterial or 
eukaryotic cell) resistant to an enediyne or an enediyne metabolite. This method involves 
expressing in the cell one or more isolated C-1027 open reading frame nucleic acids that 
encode a protein selected from the group consisting of a CagA apoprotein, a SgcB 
5 transmembrane efflux protein, a transmembrane transport protein, a Na+/H+ transporter, an 
ABC transport, a glycerol phosphate tranporter, and a UvrA-like protein. In preferred 
embodiments, the isolated C-1027 open reading frame nucleic acids are selected from the 
group consisting of ORF 9, ORF2, ORF 27, ORF 0, ORF 1 c-terminus, ORF 2, and ORF 1 
N-terminus. Certain embodiments exclude cagA (ORF 9). 
10 In one embodiment, this invention specifically excludes one or more of open 

reading frames -7 through 42. In particular, in one embodiment this invention excludes cagA 
(ORF 9), and/or sgcA (ORF 1), and/or sgcB (ORF 2). 

DEFINITIONS 

The terms "C-1027 open reading frame", and "C-1027 ORF" refer to an open 
15 reading frame in the C-1027 biosynthesis gene cluster as isolated from Streptomyces 
globisporus. The term also embraces the same open reading frames as present in other 
enediyne-synthesizing organisms (e.g. other strains and/or species of Streptomyces, 
Actinomyces, and the like). The term encompasses allelic variants and single nucleotide 
polymorphisms (SNPs). In certain instances the C-1027 ORF is used synonymously with the 
20 polypeptide encoded by the C-1027 ORF and may include conservative substitutions in that 
polypeptide. The particular usage will be clear from context. 

The terms "isolated" "purified" or "biologically pure" refer to material which 
is substantially or essentially free from components which normally accompany it as found 
in its native state. With respect to nucleic acids and/or polypeptides the term can refer to 
25 nucleic acids or polypeptides that are no longer flanked by the sequences typically flanking 
them in nature. 

The terms "polypeptide", "peptide" and "protein" are used interchangeably 
herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers 
in which one or more amino acid residue is an artificial chemical analogue of a 
30 corresponding naturally occurring amino acid, as well as to naturally occurring amino acid 
polymers. The term also includes variants on the traditional peptide linkage joining the 
amino acids making up the polypeptide. 




The terms "nucleic acid" or "oligonucleotide" or grammatical equivalents 
herein refer to at least two nucleotides covalently linked together. A nucleic acid of the 
present invention is preferably single-stranded or double stranded and will generally contain 
phosphodiester bonds, although in some cases, as outlined below, nucleic acid analogs are 
5 included that may have alternate backbones, comprising, for example, phosphoramide 
(Beaucage et al (1993) Tetrahedron 49:1925) and references therein; Letsinger (1970) J. 
Org. Chem. 35:3800; Sprinzl et al. (1977) Eur. J. Biochem. 81: 579; Letsinger et al (1986) 
Nucl Acids Res. 14: 3487; Sawai et al (1984) Chem. Lett. 805, Letsinger et al (1988) J. Am. 
Chem. Soc. 110: 4470; and Pauwels et al (1986) Chemica Scripta 26: 141 9), 

10 phosphorothioate (Mag et al. (1991) Nucleic Acids Res. 19: 1437; and U.S. Patent No. 
5,644,048), phosphorodithioate (Briu et al (1989) J. Am. Chem. Soc. 1 1 1 :2321, O- 
methylphophoroamidite linkages (see Eckstein, Oligonucleotides and Analogues: A 
Practical Approach, Oxford University Press), and peptide nucleic acid backbones and 
linkages (see Egholm (1992) J. Am. Chem. Soc. 1 14:1895; Meier et al (1992) Chem. Int. Ed. 

15 Engl 31: 1008; Nielsen (1993) Nature, 365: 566; Carlsson et al (1996) Nature 380: 207). 
Other analog nucleic acids include those with positive backbones (Denpcy et al (1995) 
Proc. Natl Acad. Sci. USA 92: 6097; non-ionic backbones (U.S. Patent Nos. 5,386,023, 
5,637,684, 5,602,240, 5,216,141 and 4,469,863; Angew. (1991) Chem. Ml Ed. English 30: 
423; Letsinger et al (1988) J. Am. Chem. Soc. 1 10:4470; Letsinger et al (1994) Nucleoside 

20 & Nucleotide 13: 1597; Chapters 2 and 3, ASC Symposium Series 580, "Carbohydrate 

Modifications in Antisense Research", Ed. Y.S. Sanghui and P. Dan Cook; Mesmaeker et al 
(1994), Bioorganic & Medicinal Chem. Lett. 4: 395; Jeffs et al (1994) J. Biomolecular NMR 
34:17; Tetrahedron Lett. 37:743 (1996) and non-ribose backbones, including those described 
in U.S. Patent Nos. 5,235,033 and 5,034,506, and Chapters 6 and 7, ASC Symposium Series 

25 580, Carbohydrate Modifications in Antisense Research, Ed. Y.S. Sanghui and P. Dan Cook, 
Nucleic acids containing one or more carbocyclic sugars are also included within the 
definition of nucleic acids (see Jenkins et al (1995), Chem. Soc. Rev. pp 169- 176). Several 
nucleic acid analogs are described in Rawls, C & E News June 2, 1997 page 35. These 
modifications of the ribose-phosphate backbone may be done to facilitate the addition of 

30 additional moieties such as labels, or to increase the stability and half-life of such molecules 
in physiological environments. 

The term "heterologous" as it relates to nucleic acid sequences such as coding 
sequences and control sequences, denotes sequences that are not normally associated with a 




region of a recombinant construct, and/or are not normally associated with a particular cell. 
Thus, a "heterologous" region of a nucleic acid construct is an identifiable segment of 
nucleic acid within or attached to another nucleic acid molecule that is not found in 
association with the other molecule in nature. For example, a heterologous region of a 
5 construct could include a coding sequence flanked by sequences not found in association 
with the coding sequence in nature. Another example of a heterologous coding sequence is a 
construct where the coding sequence itself is not found in nature (e.g., synthetic sequences 
having codons different from the native gene). Similarly, a host cell transformed with a 
construct which is not normally present in the host cell would be considered heterologous for 

10 purposes of this invention. 

A "coding sequence" or a sequence which "encodes" a particular polypeptide 
(e.g. a PKS, an NRPS, etc.), is a nucleic acid sequence which is ultimately transcribed and/or 
translated into that polypeptide in vitro and/or in vivo when placed under the control of 
appropriate regulatory sequences. In certain embodiments, the boundaries of the coding 

15 sequence are determined by a start codon at the 5 r (amino) terminus and a translation stop 
codon at the 3 1 (carboxy) terminus. A coding sequence can include, but is not limited to, 
cDNA from procaryotic or eucaryotic mRNA, genomic DNA sequences from procaryotic or 
eucaryotic DNA, and even synthetic DNA sequences. In preferred embodiments, a 
transcription termination sequence will usually be located 3' to the coding sequence. 

20 Expression "control sequences" refers collectively to promoter sequences, 

ribosome binding sites, polyadenylation signals, transcription termination sequences, 
upstream regulatory domains, enhancers, and the like, which collectively provide for the 
transcription and translation of a coding sequence in a host cell. Not all of these control 
sequences need always be present in a recombinant vector so long as the desired gene is 

25 capable of being transcribed and translated. 

"Recombination" refers to the reassortment of sections of DNA or RNA 
sequences between two DNA or RNA molecules. "Homologous recombination" occurs 
between two DNA molecules which hybridize by virtue of homologous or complementary 
nucleotide sequences present in each DNA molecule. 

30 The terms "stringent conditions" or "hybridization under stringent conditions" 

refers to conditions under which a probe will hybridize preferentially to its target 

subsequence, and to a lesser extent to, or not at all to, other sequences. "Stringent 

hybridization" and "stringent hybridization wash conditions" in the context of nucleic acid 
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hybridization experiments such as Southern and northern hybridizations are sequence 
dependent, and are different under different environmental parameters. An extensive guide 
to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in 
Biochemistry and Molecular Biology—Hybridization with Nucleic Acid Probes part I chapter 
5 2 Overview of principles of hybridization and the strategy of nucleic acid probe assays^ 
Elsevier, New York. Generally, highly stringent hybridization and wash conditions are 
selected to be about 5°C lower than the thermal melting point (T m ) for the specific sequence 
at a defined ionic strength and pH. The T m is the temperature (under defined ionic strength 
and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Very 

10 stringent conditions are selected to be equal to the T m for a particular probe. 

An example of stringent hybridization conditions for hybridization of 
complementary nucleic acids which have more than 100 complementary residues on a filter 
in a Southern or northern blot is 50% formamide with 1 mg of heparin at 42°C, with the 
hybridization being carried out overnight. An example of highly stringent wash conditions is 

15 0.15 M NaCl at 72°C for about 15 minutes. An example of stringent wash conditions is a 
0.2x SSC wash at 65°C for 15 minutes (see, Sambrook et al (1989) Molecular Cloning - A 
Laboratory Manual (2nd ed.) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor 
Press, NY, for a description of SSC buffer). Often, a high stringency wash is preceded by a 
low stringency wash to remove background probe signal. An example medium stringency 

20 wash for a duplex of, e.g., more than 100 nucleotides, is lx SSC at 45°C for 15 minutes. An 
example low stringency wash for a duplex of, e.g,, more than 100 nucleotides, is 4-6x SSC at 
40°C for 15 minutes. In general, a signal to noise ratio of 2x (or higher) than that observed 
for an unrelated probe in the particular hybridization assay indicates detection of a specific 
hybridization. Nucleic acids which do not hybridize to each other under stringent conditions 

25 are still substantially identical if the polypeptides which they encode are substantially 
identical. This occurs, e.g., when a copy of a nucleic acid is created using the maximum 
codon degeneracy permitted by the genetic code. 

Expression vectors are defined herein as nucleic acid sequences that are direct 
the transcription of cloned copies of genes/cDNAs and/or the translation of their mRNAs in 

30 an appropriate host. Such vectors can be used to express genes or cDNAs in a variety of 

hosts such as bacteria, bluegreen algae, plant cells, insect cells and animal cells. Expression 

vectors include, but are not limited to, cloning vectors, modified cloning vectors, specifically 

designed plasmids or viruses. Specifically designed vectors allow the shuttling of DNA 
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between hosts, such as bacteria-yeast or bacteria- animal cells. An appropriately constructed 
expression vector preferably contains: an origin of replication for autonomous replication in 
a host cell, a selectable marker, optionally one or more restriction enzyme sites, optionally 
one or more constitutive or inducible promoters. In preferred embodiments, an expression 
5 vector is a replicable DNA construct in which a DNA sequence encoding a one or more PKS 
and/or NRPS domains and/or modules is operably linked to suitable control sequences 
capable of effecting the expression of the products of these synthase and/or synthetases in a 
suitable host. Control sequences include a transcriptional promoter, an optional operator 
sequence to control transcription and sequences which control the termination of 

10 transcription and translation, and so forth. 

The term "conservative substitution" is used in reference to proteins or 
peptides to reflect amino acid substitutions that do not substantially alter the activity 
(specificity or binding affinity) of the molecule. Typically conservative amino acid 
substitutions involve substitution one amino acid for another amino acid with similar 

15 chemical properties (e.g. charge or hydrophobicity). The following six groups each contain 
amino acids that are typical conservative substitutions for one another: 1) Alanine (A), 
Serine (S), Threonine (T); 2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), 
Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine (M), 
Valine (V); and 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W). 

20 The "group consisting of ORF-1 through ORF 42" refers to the group 

consisting of ORF -7, ORF -6, ORF -5, ORF -4, ORF -3, ORF -2, ORF -1, ORF 0, ORF 1, 
ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 81, ORF 1, ORF 2, ORF 3, ORF 4, 
ORF 5, ORF 6, ORF 7, ORF 8, ORF 9, ORF 10, ORF 11, ORF 12, ORF 13, ORF 14, ORF 
15, ORF 16, ORF 17, ORF 18, ORF 19, ORF 20, ORF 21, ORF 22, ORF 23, ORF 24, ORF 

25 25, ORF 26, ORF 27, ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33, ORF 34, ORF 
35, ORF 36, ORF 37, ORF 38, ORF 39, ORF 40, ORF 41, and ORF 42 as identified in 
Tables I and II. In certain embodiments ORF 9 (cagA) is excluded. 

A "biological molecule that is a substrate for a polypeptide encoded by a 
enediyne (e.g., C-1027) biosynthesis gene" refers to a molecule that is chemically modified 

30 by one or more polypeptides encoded by open reading frame(s) of the C-1027 biosynthesis 
gene cluster. The "substrate" may be a native molecule that typically participates in the 
biosynthesis of an enediyne, or can be any other molecule that can be similarly acted upon 
by the polypeptide. 
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A "polymorphism" is a variation in the DNA sequence of some members of a 
species. A polymorphism is thus said to be "allelic/' in that, due to the existence of the 
polymorphism, some members of a species may have the unmutated sequence (i.e. the 
original "allele") whereas other members may have a mutated sequence {i.e. the variant or 
5 mutant "allele"). In the simplest case, only one mutated sequence may exist, and the 
polymorphism is said to be diallelic. In the case of diallelic diploid organisms, three 
genotypes are possible. They can be homozygous for one allele, homozygous for the other 
allele or heterozygous. In the case of diallelic haploid organisms, they can have one allele or 
the other, thus only two genotypes are possible. The occurrence of alternative mutations can 

10 give rise to trialleleic, etc. polymorphisms. An allele may be referred to by the nucleotide(s) 
that comprise the mutation. 

"Single nucleotide polymorphism" or "SNPs are defined by their 
characteristic attributes. A central attribute of such a polymorphism is that it contains a 
polymorphic site, "X," most preferably occupied by a single nucleotide, which is the site of 

15 the polymorphism's variation (Goelet and Knapp U.S. patent application Ser. No. 

08/145,145). Methods of identifying SNPs are well known to those of skill in the art (see, 
e.g., U.S. Patent 5,952,174). 

Abbreviations used herein include LB, Luria-Bertani; NGDH, dNDP-glucose 
4,6-dehydratase ; nt, nucleotide; ORF, open reading frame; PCR, polymerase chain reaction; 

20 PEG, polyethyleneglycol; PKS, polyketide synthase; RBS, ribosomal binding site; Apr, 
apramycin; R, resistant; Th, thiostrepton; WT, wild-type; and TS, temperature sensitive 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 illustrates the structures of C-1027 chromophore and the benzenoid 
diradical intermediate proposed to initiate DNA cleavage. 
25 Figure 2 illustrates a scheme using C-1027 open reading frame polypeptides 

for the synthesis of deoxysugars. 

Figure 3 A illustrates a scheme using C-1027 open reading frame polypeptides 
for the synthesis of a p-amino acid. 

Figure 3B illustrates a scheme using C-1027 open reading frame polypeptides 
30 for the synthesis of a benzoxazolinate. 
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Figure 4 illustrates the synthesis of the enediyne core and final assembly of 

the C-1027 enediyne. 

Figures 5 A, 5B, and 5C illustrate the organization of the C-1027 enediyne 
biosynthetic gene cluster. Figure 5A shows a restriction map of the 75-kb sgc gene cluster 
5 from S. globisporus as represented by three cosmid clones. Figure 5B illustrates the genetic 
organization of the sgcA, sgcB, and cagA genes, showing that they are clustered in the sgc 
gene cluster. Probe 1, the 0.55-kb dNDP-glucose 4,6-dehydratase gene fragment from 
pBS1002. Probe 2, the 0.73-kb cagA fragment from pBS1003. A, Apal; B, BamHl; E, 
EcoRl; K, Kpnl, S, Sacll; Sp, Sphl. Figure 5C shows the genetic organization of the C-1027 
10 biosynthesis gene cluster. 

Figure 6 shows the DNA and deduced amino acid sequences of the 3,0-kb 
BamHl fragment from pBS1007, showing the sgcA and sgcB genes. Possible RBSs are 
boxed. The presumed translational start and stop sites are in boldface. Restriction enzyme 
sites of interest are underlined. The amino acids, according to which the degenerated PCR 
15 primer were designed for amplifying the dNDP-glucose 4,6-dehydratase gene from S. 
globisporus, are underlined. 

Figure 7 shows the amino acid sequence alignment of SgcA with three other 
dNDP-glucose 4,6-dehydratases. Gdh, TDP-glucose 4,6-dehydratase of S. erythraea 
(AAA6821 1); MtmE, TDP-glucose 4,6-dehydratase in the mithramycin pathway of S. 
20 argillaceus (CAA7 1 847); TylA2, TDP-glucose 4,6-dehydratase in the tylosin pathway of S, 
fradiae (S49054). Given in parentheses are protein accession numbers. The apa fold with 
the NAD + -binding motif of GxGxxG is boxed. 

Figures 8 A and 8B show disruption of sgc A by single crossover homologous 
recombination. Figure 8 A shows construction of sgcA disruption mutant and restriction 
25 maps of the wild-type S. globisporus C-1027 and S. globisporus SB 1001 mutant strains 
showing predicted fragment sizes upon BamHl digestion. Figures 8B and 8C show a 
Southern analysis of S. globisporus C-1027 (lane 1) and S. globisporus SB1001 (lanes 2, 3, 
and 4, three individual isolates) genomic DNA, digested with BamHl, using (Figure 8B) 
pOJ260 vector or (Figure 8C) the 0.75-kb SaclllKpnl fragment of sgcA from pBS1012 as a 
30 probe, respectively. B, BamHl; K, Kpnl; S, Sacll. 

Figures 9 A, 9B, 9B, and 9D illustrate the determination of C-1027 production 
in various S. globisporus strains by assaying their antibacterial activity against M luteus. 
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Figure 9A:1, S. globisporusC-1027; 2,3, and 4, S. globisporus SB 1001 (three individual 
isolates); 5, S. globisporus AF67; 6, S. globisporus AF40. Figure 9B: 1, S. globisporus C- 
1027; 2, S. globisporus SB1001 (pWHM3); 3 and 4, S. globisporus SB1001 (pBS1015) (two 
individual isolates). Both S. globisporus SB 1001 (pWHM3) and S. globisporus SB 1001 
5 (pBS1015) were grown in the presence of 5 jig/mL thiostrepton. Figure 9C: 1, S. 

globisporusC-1027; 2, S. globisporus SB1001 (pBS1015); 3. S. globisporus SB1001; 4. S. 
globisporus SB1001 (pWHM3); 5. S. globisporus AF40; 6. 5. globisporus AF44. All 5. 
globisporus strains were grown in the absence of thiostrepton. Figure 9D: 1.5. globisporus 
(pKC1139); 2. S. globisporus (pBS1018). 

1 0 DETAILED DESCRIPTION 

This invention provides a complete gene cluster regulating the biosynthesis of 
C-1027, the most potent member of the enediyne antitumor antibiotic family. C-1027 is 
produced by Streptomyces globisporus C-1027 and consists of an apoprotein (encoded by the 
cagA gene) and a non-peptidic chromophore. The C-1027 chromophore could be viewed as 
15 being derived biosynthetically from a benzoxazolinate, a deoxyamino hexose, a p-amino 
acid, and an enediyne core. Adopting a strategy to clone the C-1027 biosynthesis gene 
cluster by mapping a putative dNDP-glucose 4,6-dehydratase (NGDH) gene to cagA, we 
localized 75 kb contiguous DNA from S. globisporus encoding a complete C-1027 gene 
cluster. 

20 Initial sequencing of the cloned gene cluster revealed two genes, sgcA and 

sgcB } that encode an NGDH enzyme and a transmembrane efflux protein, respectively, and 
confirmed that the cagA gene resides approximately 14 kb upstream of the sgcA,B locus. 
The involvement of the cloned gene cluster in C-1027 biosynthesis was demonstrated by 
disrupting the sgcA gene to generate C-1027-nonproducing mutants and by complementing 

25 the sgcA mutants in vivo to restore C-1027 production. 

Subsequent DNA sequence analysis provided the complete enediyne C-1027 
gene cluster sequence (SEQ ID NOs: 1 and 2) revealing 50 open reading frames which are 
summarized in Tables I and II. These results represent the first cloning of a gene cluster for 
enediyne anti-tumor antibiotic biosynthesis. 

30 
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Table L Summary of the C-1027 gene cluster open reading frames. Table 1. C-1027 gene 
cluster open reading frames (-7 to 26), primers for ORF amplification, and proposed 
functions 



orf# Size Relative Primers Function Seq 

position ID 



No. 


orf - 


648 


658-11 


Fwd: 


ATG 


GGC 


ATG 


ACG 


GGT 


very weak 


3 


(-7) 


bp 




Rev: 


CTA 


GAG 


GAT 


CCC 


GGG 


homology to 


4 










































ny U.X. uAy labc 




orf - 


549 


147 8 - 


Fwd : 


Alb 




Lbb 


Al 1 


bbb 


vlldl 


c 


(-6) 


bp 


93 0 


Rev : 


rpp 71 

1 LA 


bL 1 


bl b 


HAT 
bAl 


uJ.b- 


niLcbbi viuy 






















potentiator 






















>~\ "V" f~y I - d "1 T1 
pi D L. fc= -LI1 




orf- 


1065 


2713- 


Fwd: 


ATG 


ACC 


ATC 


GCC 


ACT 


N-truncated 


7 


C-5) 


bp 


1649 


Rev: 


TCA 


GAG 


GCC 


GAG 


CAC 


Methionine 


8 


















o t m "I - Vi^aoci (Hi \r o 1 \7 

oyiiuiiciofc; v J- -J- e -L y 






















psuedogene) 




orf- 


387 


3238- 


Fwd: 


ATG 


AGC 


TCG 


CTA 


CTG 


Viral 


9 


(-4) 


bp 


2851 


Rev : 


CTA 


GGA 


GCC 


GGT 


ppp 
LbL 


transcription 


1 Pi 
J. U 




















factor 




orf- 


1530 


4971- 


Fwd: 


ATG 


AGC 


AGC 


AGC 


GCC 


Viral Homo log 


11 


(-3) 


bp 


3442 


Rev : 


TCA 


TTC 


GTC 


GGC 


TGC 


possibly primase 


JLZ 


orf- 


3027 


5982- 


Fwd: 


GTG 


AGG 


GCT 


CTG 


CCG 


Glycerol- 


13 


(-2) 


bp 


7478 


Rev : 


TCA 


bAL 


ppp 


pp7\ 

bbA 


ppp 

bbb 


irnospnaue a±5v_ 


1 A 




















Transporter 






















\ OI1UA VJ.-L. U.y 






















IcblbL dllO C / 




orf - 


2328 


99 00 - 


Fwd : 


GTG 


AGC 


GTC 


7\ pn 

ALU 


p 7\ p 

bAb 


uviA-iiKe Qiug 




(-1) 


bp 


7573 


Rev : 


TCA 


ACC 


CGC 


CCT 


GCG 


resistance pump 


j. b 


orf - 


13 68 


11349- 


Fwd : 


Alb 


Abb 


7\ rpp 

Alb 


Lib 


prpp 

bib 


TVTa + of flnv 

in a /xi em ux. 


1 7 
J. / 


0 


bp 


9982 


Rev : 


GTG 


GCT 


GTG 


CTC 


p p A 

bLA 


pump 


T Q 

JL O 


orf - 


99 9 


28590- 


Fwd : 


Al b 


a rr* 
Abb 


A 

Alb 


Lib 


prpp 

bib 


s3 "NTT 1 n_/f1 1 1 pnao 
CUM lr yiULUbc 


1 Q 


1 


bp 


29588 


Rev : 


rpP "A 


ppp 
bLL 


P7\ p 

bAL 


ppp 
bbb 


prpp 

b 1 L 


^ £i Vlt V~ ^1 "t~ ^ O £2l 

Qcnyurduabc 




ort - 


1566 


z y 6 6 z - 


Fwd : 


bib 


A. H7± 
ALA 


ULA 


bib 




± J_ Ctiio LiLCLLUlJX. O.HG 


21 


2 


bp 


31197 


Rev : 


TCA 


TGT 


bbL 


ppp 
Lbb 


rprpp 

1 lb 


sin ux protein 




orf- 


1311 


31280- 


Fwd: 


GTG 


GAG 


TAC 


TGG 


AAC 


Coenzyme F3 90 


23 


3 


bp 


32590 


Rev: 


TCA 


GGC 


CTG 


AGG 


GGC 


synthase 


24 


















phenylacetyl-CoA 






















ligase 




orf - 


1584 


32809- 


Fwd: 


GTG 


CCC 


CAC 


GGT 


GCA 


phenol 


25 


4 


bp 


34392 


Rev: 


CTA 


CAG 


CCC 


TCC 


GAG 


hydroxylase 


26 




















chlorophenol-4 - 






















monooxygena s e 




orf - 


bp 


35274- 


Fwd: 


ATG 


TCT 


TCA 


ACC 


CGT 


citrate 


27 


5 




34458 


Rev: 


TCA 


GCC 


GCG 


CAG 


GAA 


transport 


28 




















protein 




orf - 


1272 


17924- 


Fwd: 


ATG 


CTG 


GAG 


AAA 


TGC 


C -methyl 


29 


6 




16653 


Rev: 


TCA 


GAC 


GAG 


CTC 


CTT 


transferase 


30 




















hydroxylase 




orf- 


735 


16653- 


Fwd: 


ATG 


GAG 


TAC 


GGC 


CCC 


N- 


31 
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7 


bp 


15919 


Rev : 


TCA 


TGC 


CGT 


GCG 


CAC 


methyltransf eras 


32 


orf - 


1233 


15922- 


Fwd: 


ATG 


AGC 


GGC 


GGC 


CCG 


e 

Aminotransferase 


33 


8 


bp 


14690 


Rev : 


TCA 


CCT 


CGC 


CGG 


ACG 




34 


orf - 


432 


14643- 


Fwd: 


ATG 


TCG 


TTA 


CGT 


CAC 


CagA 


35 


9 


bp 


14212 


Rev: 


TCA 


GCC 


GAA 


GGT 


CAG 




36 


orf- 


1068 


13012- 


Fwd: 


ATG 


AAG 


GCA 


CTT 


GTA 


dNTP-glucose 


37 


10 


bp 


14079 


Rev: 


TCA 


GGC 


CGC 


GAT 


CTC 


synthase 


38 


orf- 


1485 


12835- 


Fwd: 


GTG 


GAC 


GTG 


TCA 


GCG 


Hydroxylase, 


39 


11 


bp 


11351 


Rev : 


TCA 


GGA 


CCG 


CGC 


ACC 


Halogenase 


4 U 


orf - 


579 


25564- 


Fwd: 


ATG 


AAG 


CCG 


ATC 


GGG 


dNTP-4-keto-6- 


41 


12 


bp 


24986 


Rev: 


TCAGGA CGA CTT GTT 


deoxyglucose 


42 




















3 , 5-epimerase 




orf- 


1137 


24702- 


Fwd: 


ATG 


CCT 


TCC 


CCC 


TTC 


3-0- 


43 


13 


bp 


23566 


Rev: 


TCA 


GGT 


GCG 


CTC 


GGC 


acyl transferase 


44 


orf - 


1455 


22878- 


Fwd: 


GTG 


AGA 


GAC 


GGC 


CGG 


Coenzyme F-390 


45 


14 


bp 


21424 


Rev: 


TCA CGT GGT GAT GGC 


Synthase 


46 




















Phenylacetyl CoA 






















Ligase 




orf- 


1482 


21407- 


Fwd: 


ATG 


ACC 


GAC 


CAG 


TGC 


Anthranilate 


47 


15 


bp 


19926 


Rev: 


TPS 




CAA 




CTC 


Synthase I 


48 


orf - 


663 


19929- 


Fwd: 


GTG 


AGC 


TTG 


TGG 


TCT 


Anthranilate 


49 


16 


bp 


19267 


Rev: 


TCA 


GGC 


CGG 


TTC 


GGC 


Synthase II 


50 


orf- 


1161 


19191- 


Fwd: 


GTG 


CGT 


CCC 


TTC 


CGT 


epoxide 


51 


17 


bp 


18031 


Rev: 


TCA 


GCG 


GAG 


CGG 


ACG 


hydrolase 


52 


orf - 


423 


35938- 


Fwd: 


ATG 


CCA 


GCA 


CCG 


ACT 


Unknown 


53 


18 


bp 


35516 


Rev: 


TCA 


GTC 


GTT 


GCC 


GCG 




54 


orf- 


1380 


27214- 


Fwd: 


ATG 


CGG 


GTG 


ATG 


ATC 


glycosyl 


55 


19 


bp 


28593 


Rev: 


TCA 


TCG 


GTC 


CGC 


CTC 


transferase 


56 


orf - 


1356 


25815- 


Fwd: 


ATG 


ACC 


AAG 


CAC 


GCC 


squalene 


57 


20 


bp 


27170 


Rev: 


TCA 


TAC 


GGC 


GGC 


GCC 


monooxygenase 


58 


orf - 


672 


23546- 


Fwd: 


GTG 


AGC 


GCA 


CAA 


CTC 


hypothetical Fe- 


59 


21 


bp 


22875 


Rev: 


TCA 


CGG 


CTG 


TGC 


CTG 


S flavoprotein 


60 


orf - 


816 


35274- 


Fwd: 


ATG 


TCT 


TCA 


ACC 


CGT 


haloacetate 


61 


22 


bp 


34458 


Rev: 


TCA 


GCC 


GCG 


CAG 


GAA 


dehalogenase 


62 




















hydrolase 




orf - 


1380 


37559- 


Fwd : 


ATG 


ACG 


ACG 


TCC 


GAC 


peptide 


63 


23 


bp 


38938 


Rev : 


TCA 


GGA 


GGT 


GAA 


GGG 


synthetase 


64 


orf- 


1620 


40986- 


Fwd: 


ATG 


GCA 


TTG 


ACT 


CAA 


Histidine 


65 


24 


bp 


39367 


Rev: 


TCA 


GCG 


CAG 


CTG 


GAT 


Ammonia lyase 


66 


orf - 


1560 


42611- 


Fwd: 


ATG 


ACG 


CGG 


CCG 


GTG 


Type II 


67 


25 


bp 


41052 


Rev: 


TCA GCG GGT GAG CCG 


adenylation 


68 




















protein 




orf - 


282 


38983- 


Fwd: 


GTG 


TCC 


ACC 


GTT 


TCC 


Type II peptidyl 


69 


26 


bp 


39264 


Rev: 


TCA 


CTG 


CGT 


TCC 


GGA 


carrier protein 


70 
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Table II. C-1027 gene cluster open reading frames (27 to 42), primers for ORF 
amplification, and proposed functions 



ORF Relative Primers Function SEQ 

Position ID 

NO. 



orf - 


43945 


-46023 


Fwd: 


GTG 


TGC 


CCG 


GTG 


ACA 


GAC 


Antibiotic 


71 


27 






Rev: 


TCA 


GCC 


CAC 


GGG 


CTG 


GGA 


Transporter 


72 


orf- 


46167 


-47171 


Fwd: 


GTG 


TTG 


GGC 


GAT 


GAG 


GAC 


0- 


73 


28 






Rev: 


TCA 


GAC 


CGC 


GGA 


CAT 


CTG 


methyl transfer 


74 



ase 



orf - - 


4 72 2 7 


-484 85 


Fwd : 


ATG 


GCC 


GGC 


CTG 


GTC 


ATG 


p450 


75 


O Q 






Rev : 


TCA 


GGA 


CCC 


GAG 


GGT 


CAC 


hvdroxvlase 


76 


nr f _ 

KJ X- _L 


48610 


-4 97 14 


Fwd : 


GTG 


GAC 


CAG 


ACG 


TCT 


ACG 


Oxidoreduct ase 


77 


3 0 






Rev : 


TCA 


TGC 


AGG 


TGC 


AGC 


GTG 




78 


orf — 


J \J O ~J \J 




Fwd : 


ATG 


AGG 


CCG 


CTC 


GTT 


CGG 


Unknown 


79 








Rev : 


TCA TCC CGG CCC GGC GGC 


Protein 


80 




5142 0 


-52341 


Fwd : 


ATG 


AGA 


ACG 


CGG 


CGA 


CGC 


Oxidoreduct ase 


81 


32 






Rev: 


TCA 


CGG 


CCG 


GAG 


GCG 


TAC 




82 


orf - 


53241 


-54074 


Fwd : 


GTG 


TAT 


CAG 


CCG 


GAC 


TGT 


Unknown 


83 


33 






Rev: 


CTA 


CTC 


ATT 


CCA 


GTT 


GTG 


Protein 


84 


orf - 


54230 


-55379 


Fwd: 


ATG 


TCT 


ACG 


GGC 


TAT 


CTC 


Unknown 


85 


34 






Rev: 


TCA 


GCC 


GCC 


GGT 


GGC 


GCC 


Protein 


86 


orf - 


56027 


-56881 


Fwd: 


ATG 


TTC 


TCC 


CCC 


GCC 


GCC 


Oxidase/ 


87 


35 






Rev: 


TCA 


GTA 


CGC 


CTG 


GTG 


GGC 


Dehydrogenase 


88 


orf- 


56928 


-57730 


Fwd: 


ATG 


AAT 


TCG 


CTC 


GAC 


GAC 


Unknown 


89 


36 






Rev 


: TCA GCT CCC GGT CGC CGC 


Protein 


90 


orf - 


57834 


-58304 


Fwd: 


ATG 


ACC 


GCG 


ACG 


AAT 


CCT 


Regulatory 


91 


37 






Rev: 


CTA 


GGC 


GGC 


GCG 


TCC 


CGC 




92 


orf- 


58440 


-60091 


Fwd: 


ATG 


AGC 


ACC 


ACG 


GCC 


GAG 


Oxidoreduct ase 


93 


38 






Rev: 


TCA 


GCC 


GCG 


CGC 


CGA 


CGG 




94 


orf- 


60092 


-60622 


Fwd: 


ATG 


ACC 


CTG 


GAG 


GCC 


TAC 


Regulatory 


95 


39 






Rev: 


TCA 


TGC 


GGG 


GCT 


CCC 


GGT 




96 


orf- 


60940 


-62020 


Fwd: 


GTG 


AAA 


AGT 


GAC 


TCT 


GCC 


Regulatory 


97 


40 






Rev: 


TCA 


ACG 


GCG 


AGT 


TGG 


CTG 




98 


orf- 


62045 


-62899 


Fwd: 


GTG 


ACC 


ACG 


AAC 


ACC 


ATC 


Regulatory 


99 


41 






Rev: 


TCA 


CCC 


GCG 


ATC 


TCG 


ATC 




10 


orf - 


62788 


-63164 


Fwd : 


(partial ORF) 






p450 


10 


42 






Rev: 


TCA 


CCT 


CGC 


CGT 


ACT 


CAC 


hydroxylase 


10 



5 

Surprisingly, sequence analysis failed to reveal any gene that resembles a 
polyketide synthase. The C-1027 open reading frames, however, encode polypeptides 
exhibiting a wide variety of enzymatic activities {e.g., epoxide hydrase, monooxygenase, 
oxidoreductase, P-450 hydroxylase, etc.). The isolated C-1027 gene cluster can be used to 
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synthesize C-1027 enediyne antibiotics and/or analogues thereof. The C-1027 gene cluster 
can be modified and/or augmented to increase C-1027 and/or C-1027 analogue production. 

Alternatively, various components of the C-1027 gene cluster can be used to 
synthesize and/or chemically modify a wide variety of metabolites. Thus, for example, ORF 
5 6 (C-methyltransferase) can be used to methylate a carbon, while ORF 12, an epimerase, can 
be used to change the conformation of a sugar. The ORFs can be combined in their native 
configuration or in modified configurations to synthesize a wide variety of 
biomolecules/metabolites. Thus, for example, various combinations of C-1027 open reading 
frames can be used to synthesize an enediyne core, to synthesize a deoxy sugar, to synthesize 

10 a p-amino acid, to make a benzoxazolinate, etc (see, e.g., Figures 2, 3, and 4). 

The native C-1027 gene cluster ORFs can be re-ordered, modified, and 
combined with other biosynthetic units (e.g. polyketide synthases (PKSs) or catalytic 
domains thereof and/or non-ribosomal polypeptide synthetases (NRPSs) or catalytic domains 
thereof) to produce a wide variety of molecules. Large chemical libraries can be produced 

15 and then screened for a desired activity. 

The C-1027 gene cluster also includes a number of drug resistance genes (see, 
e.g., Table 2) that confer resistance to C-1027 and/or metabolites involved in C-1027 
biosynthesis thereby permitting the cell to complete the enediyne biosynthesis. These 
resistance genes can be used to confer enediyne resistance on a cell lacking such resistance 

20 or to augment the enediyne resistance of a cell that does tolerate enediynes. Such cells can 
be used to produce high levels of enediynes and/or enediyne metabolites, and/or enediyne 
analogues. 



Table III. C-1027 cluster drug resistance genes. 



ORF 


Protein 


Mechanism 


ORF 9: 


CagA apoprotein 


Drug sequestering 


ORF 2: 


SgcB transmembrane efflux protein 


Drug exporting 


ORF 27 


Transmembrane transport protein 


Drug exporting 


ORF0 


Na + /H + transporter 


Drug exporting 


ORF-1 


ABC transport (C- terminus) 


Drug exporting 


ORF -2 


Glycerol phosphate transporter 


Drug exporting 


ORF-1 


UvrA-like protein (N-terminus) 


DNA repairing 
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L Isolation, preparation, and expression of C-1027 nucleic acids. 

The C-1027 gene cluster nucleic acids can be isolated, optionally modified, 
and inserted into a host cell to create and/or modify a metabolic (biosynthetic) pathway and 
5 thereby enable that host cell to synthesize and/or modify various metabolites. Alternatively 
the C-1027 gene cluster nucleic acids can be expressed in the host cell and the encoded C- 
1027 polypeptide(s) recovered for use as chemical reagents, e.g. in the ex vivo synthesis 
and/or chemical modification of various metabolites. Either application typically entails 
insertion of one or more nucleic acids encoding one or more isolated and/or modified C-1027 

10 enediyne open reading frames in a suitable host cell. The nucleic acid(s) are typically in an 
expression vector, a construct containing control elements suitable to direct expression of the 
C-1027 polypeptides. The expressed C-1027 polypeptides in the host cell then act as 
components of a metabolic/biosynthetic pathway (in which case the synthetic product of the 
pathway is typically recovered) or the C-1027 polypeptides themselves are recovered. Using 

15 the sequence information provided herein, cloning and expression of C-1027 nucleic acids 
can be accomplished using routine and well known methods. 

A) C-1027 nucleic acids. 

The nucleic acids comprising the C-1027 gene cluster are identified in Tables 
I and are listed in the sequence listing provided herein. In particular, Table 1 identifies genes 

20 and functions of open reading frames (ORFs) in the C-1027 enediyne biosynthesis gene 
cluster and identifies primers suitable for the amplification/isolation of any one or more of 
the C-1027 open reading frames. Of course, using the sequence information provided herein, 
other primers suitable for amplification/isolation of one or more C-1027 open reading frames 
can be determined according to standard methods well known to those of skill in the art (e.g. 

25 using Vector NTI Suite™, InforMax, Gaithersberg, MD, USA). 

Typically such amplifications will utilize the DNA or RNA of an organism 
containing the requisite genes (e.g. Streptomyces globisporus) as a template. Typical 
amplification conditions include the following PCR temperature program: initial denaturing 
at 94°C for 5 min, 24-36 cycles of 45 sec at 94°C, 1 min at 60°C, 2 min at 72°C, followed by 

30 additional 7 min at 72°C. One of skill will appreciate that optimization of such a protocol, 
e.g. to improve yield, etc. is routine (see, e.g., U.S. Patent No. 4,683,202; Innis (1990) PCR 
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Protocols A Guide to Methods and Applications Academic Press Inc. San Diego, CA, etc). 
In addition, primer may be designed to introduce restriction sites and so facilitate cloning of 
the amplified sequence into a vector. 

In one embodiment, this invention provides nucleic acids for the recombinant 
5 expression of an enediyne (e.g. a C-1027 enediyne or an analogue thereof). Such nucleic 
acids include isolated gene cluster(s) comprising open reading frames encoding polypeptides 
sufficient to direct the assembly of the enediyne. In other embodiments of this invention, the 
C-1027 open reading frames may be unchanged, but the control elements (e.g. promoters, 
enhancers, etc.) may be modified. In still other embodiments, the nucleic acids may encode 

10 selected components (e.g. one or more C-1027 or modified C-1027 open reading frames) 
and/or may optionally contain other heterologous biosynthetic elements including, but not 
limited to polyketide synthase (PKS) and/or non-ribosomal polypeptide synthetase (NRPS) 
modules or enzymatic domains. 

Such variations may be introduced by design, for example to modify a known 

15 molecule in a specific way, e.g. by replacing a single substituent of the enediyne with 
another, thereby creating a derivative enediyne molecule of predicted structure. 
Alternatively, variations can be made randomly, for example by making a library of 
molecular variants of a known enediyne by systematically or haphazardly replacing one or 
open reading frames in the biosynthetic pathway. Production of alternative/modified 

20 enediyne, and hybrid enediyne PKSs and/or NRPSs and hybrid systems is described below. 

Using the information provided herein other approaches to cloning the desired 
sequences will be apparent to those of skill in the art. For example, the enediyne, and/or 
optionally PKS and/or NRPS modules or enzymatic domains of interest can be obtained 
from an organism that expresses such, using recombinant methods, such as by screening 

25 cDNA or genomic libraries, derived from cells expressing the gene, or by deriving the gene 
from a vector known to include the same. The gene can then be isolated and combined with 
other desired biosynthetic elements using standard techniques. If the gene in question is 
already present in a suitable expression vector, it can be combined in situ, with, e.g., other 
PKS subunits, as desired. The gene of interest can also be produced synthetically, rather 

30 than cloned. The nucleotide sequence can be designed with the appropriate codons for the 

particular amino acid sequence desired. In general, one will select preferred codons for the 

intended host in which the sequence will be expressed. The complete sequence can be 

assembled from overlapping oligonucleotides prepared by standard methods and assembled 
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into a complete coding sequence (see, e.g., Edge (1981) Nature 292:756; Nambair et al 
(1984) Science 223: 1299; Jay et al (1984) J. Biol Chem. 259:6311). In addition, it is noted 
that custom gene synthesis is commercially available (see, e.g. Operon Technologies, 
Alameda, CA). 

5 Examples of such techniques and instructions sufficient to direct persons of 

skill through many cloning exercises are found in Berger and Kimmel (1989) Guide to 
Molecular Cloning Techniques, Methods in Enzymology 152 Academic Press, Inc., San 
Diego, CA (Berger); Sambrook et al (1989) Molecular Cloning - A Laboratory Manual (2nd 
ed.) Vol 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, NY; Ausubel (19 
1 0 1 994) Current Protocols in Molecular Biology, Current Protocols, a joint venture between 
Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., U.S. Patent 5,017,478; and 
European Patent No. 0,246,864. 

Expression of f C-1027 open reading frames. 

The choice of expression vector depends on the sequence(s) that are to be 
15 expressed. Any transducible cloning vector can be used as a cloning vector for the nucleic 
acid constructs of this invention. However, where large clusters are to be expressed, it 
phagemids, cosmids, Pis, YACs, BACs, PACs, HACs or similar cloning vectors be used for 
cloning the nucleotide sequences into the host cell. Phagemids, cosmids, and BACs, for 
example, are advantageous vectors due to the ability to insert and stably propagate therein 
20 larger fragments of DNA than in M13 phage and lambda phage, respectively. Phagemids 
which will find use in this method generally include hybrids between plasmids and 
filamentous phage cloning vehicles. Cosmids which will find use in this method generally 
include lambda phage-based vectors into which cos sites have been inserted. Recipient pool 
cloning vectors can be any suitable plasmid. The cloning vectors into which pools of 
25 mutants are inserted may be identical or may be constructed to harbor and express different 
genetic markers (see, e.g., Sambrook et a/., supra). The utility of employing such vectors 
having different marker genes may be exploited to facilitate a determination of successful 
transduction. 

In preferred embodiments of this invention, vectors are used to introduce C- 
30 1027 biosynthesis genes or gene clusters into host (e.g. Streptomyces) cells. Numerous 
vectors for use in particular host cells are well known to those of skill in the art. For 
example described in Malpartida and Hopwook, (1984) Nature, 309:462-464; Kao et al, 
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(1994), Science, 265: 509-512; and Hopwood et al, (1987) Methods Enzymol, 153:1 16-166 
all describe vectors for use in various Streptomyces hosts. 

In one preferred embodiment, Streptomyces vectors are used that include 
sequences that allow their introduction and maintenance in E. coli. Such Streptomyces! E. 
5 coli shuttle vectors have been described (see, for example, Vara et al, (1989) J. Bacteriol, 
171:5872-5881; Guilfoile & Hutchinson (1991) Proc. Natl Acad. Set USA, 88: 8553-8557.) 

The wildtype and/or modified C-1027 enediyne open reading frame(s) of this 
invention, can be inserted into one or more expression vectors, using methods known to 
those of skill in the art. Expression vectors will include control sequences operably linked to 

10 the desired open reading frame. Suitable expression systems for use with the present 
invention include systems that function in eucaryotic and/or prokaryotic host cells. 
However, as explained above, prokaryotic systems are preferred, and in particular, systems 
compatible with Streptomyces spp. are of particular interest. Control elements for use in 
such systems include promoters, optionally containing operator sequences, and ribosome 

15 binding sites. Particularly useful promoters include control sequences derived from 
enediyne, and/or PKS, and/or NRPS gene clusters. Other promoters {e.g. ermE* as 
illustrated in Example 1) are also suitable. Other bacterial promoters, such as those derived 
from sugar metabolizing enzymes, such as galactose, lactose (lac) and maltose, will also find 
use in the present constructs. Additional examples include promoter sequences derived from 

20 biosynthetic enzymes such as tryptophan (trp), the beta -lactamase (bid) promoter system, 

bacteriophage lambda PL, and T5. In addition, synthetic promoters, such as the tac promoter 
(U.S. Patent 4,551,433), which do not occur in nature also function in bacterial host cells. In 
Streptomyces, numerous promoters have been described including constitutive promoters, 
such as ErmE and TcmG (Shen and Hutchinson, (1994) J. Biol Chem. 269: 30726-30733), 

25 as well as controllable promoters such as actl and actlll (Pleper et al, (1995) Nature, vol. 
378: 263-266; Pieper et al, (1995) Am, Chem. Soc, 1 17: 1 1373-1 1374; and Wiesmann et 
al, (1995) Chem. & Biol 2: 583-589). 

Other regulatory sequences may also be desirable which allow for regulation 
of expression of the enediyne open reading frame(s) relative to the growth of the host cell. 

30 Regulatory sequences are known to those of skill in the art, and examples include those 

which cause the expression of a gene to be turned on or off in response to a chemical or 

physical stimulus, including the presence of a regulatory compound. Other types of 

regulatory elements may also be present in the vector, for example, enhancer sequences. 
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Selectable markers can also be included in the recombinant expression 
vectors. A variety of markers are known which are useful in selecting for transformed cell 
lines and generally comprise a gene whose expression confers a selectable phenotype on 
transformed cells when the cells are grown in an appropriate selective medium. Such 
5 markers include, for example, genes that confer antibiotic resistance or sensitivity to the 
plasmid. 

The various enediyne cluster open reading frames, and/or PKS, and/or NRPS 
clusters or subunits of interest can be cloned into one or more recombinant vectors as 
individual cassettes, with separate control elements, or under the control of, e.g., a single 

10 promoter. The various open reading frames can include flanking restriction sites to allow for 
the easy deletion and insertion of other open reading frames so that hybrid synthetic 
pathways can be generated. The design of such unique restriction sites is known to those of 
skill in the art and can be accomplished using the techniques described above, such as site- 
directed mutagenesis and PCR. 

15 Methods of cloning and expressing large nucleic acids such as gene clusters, 

including PKS- or NRPS-encoding gene clusters, in cells including Streptomyces are well 
known to those of skill in the art (see, e.g., Stutzman-Engwall and Hutchinson (1989) Proc. 
Natl Acad. Set USA, 86: 3135-3139; Motamedi and Hutchinson (1987) Proc. Natl Acad. 
Scl USA, 84: 4445-4449; Grim et al (1994) Gene, 151: 1-10; Kao et al (1994) Science, 

20 265: 509-512; and Hopwood et al (1987) Meth. Enzymol, 153: 1 16-166). In some 
examples, nucleic acid sequences of well over lOOkb have been introduced into cells, 
including prokaryotic cells, using vector-based methods (see, for example, Osoegawa et al, 
(1998) Genomics, 52: 1-8; Woon et al, (1998) Genomics, 50: 306-316; Huang et al, (1996) 
Nucl Acids Res., 24: 4202-4209). In addition, the cloning and expression of C-1027 

25 enediyne is illustrated in Example 1 . 

Q Host cells. 

The vectors described above can be used to express various protein 
components of the enediyne, and/or enediyne shunt metabolites, and/or other modified 
metabolites for subsequent isolation and/or to provide a biological synthesis of one or more 
30 desired biomolecules (e.g. C-1027 and/or a C-1027 analogue, etc.). Where one or more 
proteins of the enediyne biosynthetic gene cluster are expressed (e.g. overexpressed) for 
subsequent isolation and/or characterization, the proteins are expressed in any prokaryotic or 
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eukaryotic cell suitable for protein expression. In one preferred embodiment, the proteins 

are expressed in E. coll 

Host cells for the recombinant production of the subject enediynes, enediyne 

metabolites, shunt metabolites, etc. can be derived from any organism with the capability of 
5 harboring a recombinant enediyne gene cluster and/or subset thereof. Thus, the host cells of 

the present invention can be derived from either prokaryotic or eucaryotic organisms. 

Preferred host cells are those of species or strains {e.g. bacterial strains) that naturally 

express enediynes. Such host cells include, but are not limited to Actinomycetes, 

Actinoplanetes, and Streptomycetes, Actinomadura, Micromonospra, and the like. 
10 Particularly preferred host cells include, but are not limited to Streptomyces globisporus, 

Streptomyces lividans, Streptomyces coelicolor, Micromonospora echinospora spp. 

calichenisis, Actinomadura verrucosopora, Micromonospora chersina, Streptomyces 

carzinostaticus, and Actinomycete L585-6. Other suitable host cells include, but are not 

limited to S. verticillis S. arnbofaciens, S. avermitilis, S. azureus, S. cinnamonensis, S. 
15 coelicolor, S. curacoi, S. erythraeus, S.fradiae, S. galilaeus, S. glaucescens, S. 

hygroscopicus, S. lividans, S. parvulus, S. peucetius, S. rimosus, S. roseofulvus, S. 

thermotolerans, and S. violaceoruber {see, e.g., Hopwood and Sherman (1990) Ann. Rev. 

Genet. 24: 37-66; O'Hagan (1991) The Polyketide Metabolites, Ellis Horwood Limited, 

etc.). 

20 In certain embodiments, a eukaryotic host cell is preferred {e.g. where certain 

glycosylation patterns are desired). Suitable eukaryotic host cells are well known to those of 
skill in the art. Such eukaryotic cells include, but are not limited to yeast cells, insect cells, 
plant cells, fungal cells, and various mammalian cells (e.g. COS, CHO HeLa cells lines and 
various myeloma cell lines). 

25 D) Recovery of the expression product. 

Recovery of the expression product {e.g., enediyne, enediyne analogue, 
enediyne biosynthetic pathway polypeptide, etc.) is accomplished according to standard 
methods well known to those of skill in the art. Thus, for example where enediyne 
biosynthetic gene cluster proteins are to be expressed and isolated, the proteins can be 
30 expressed with a convenient tag to facilitate isolation {e.g. a His 6 ) tag. Other standard 

protein purification techniques are suitable and well known to those of skill in the art {see, 
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e.g., (Quadri et al (1998) Biochemistry 37: 1585-1595; Nakano et al. (1992) Mol. Gen. 
Genet. 232: 313-321, etc.). 

Similarly where components (e.g. enediyne biosynthetic cluster orfs) are used 
to synthesize and/or modify various biomolecules (e.g. enediynes, enediyne analogues, shunt 
5 metabolites, etc.) the desired product and/or shunt metabolite(s) are isolated according to 
standard methods well know to those of skill in the art (see, e.g., Carreras and Khosla (1998) 
Biochemistry 37: 2084-2088, Deutscher (1990) Methods in Enzymology Volume 182: Guide 
to Protein Purification, M. Deutscher, ed. etc.). 

IL Use of C-1027 open reading frames in directed biosynthesis. 

10 Elements (e.g. open reading frames) of the C-1027 biosynthetic gene cluster 

and/or variants thereof can be used in a wide variety of "directed" biosynthetic processes (i.e. 
where the process is designed to modify and/or synthesize one or more particular preselected 
metabolite(s)). Essentially the entire C-1027 gene cluster can be used to synthesize a C-1027 
enediyne and/or a C-1027 enediyne analogue. Individual C-1027 cluster open reading 

15 frames can be used to perform chemically modifications on particular substrates and/or to 
synthesize various metabolites. Thus, for example, ORF 6 (C-methyltransferase can be used 
to methylate a carbon), while ORF 7 (N-methyltransferase) can be used to methylate a 
nitrogen. ORF 12, and epimerase, can be used to change the conformation of a sugar, and 
ORF 8 (an amino transferase) can be used to aminate a suitable substrate. Similarly, 

20 combinations of C-1027 open reading frames can be used to direct the synthesis of various 
metabolites (e.g. p-amino acids, deoxysugars, benzoxazolinates, and the like). These 
examples, are merely illustrative. One of skill in the art, utilizing the information provided 
here, can perform literally countless chemical modifications and/or syntheses using either 
"native" enediyne biosynthesis metabolites as the substrate molecule, or other molecules 

25 capable of acting as substrates for the particular enzymes in question. Other substrates can 
be identified by routine screening. Methods of screening enzymes for specific activity 
against particular substrates are well known to those of skill in the art. 

The biosyntheses can be performed in vivo, e.g. by providing a host cell 
comprising the desired C-1027 gene cluster open reading frames and/or in vivo, e.g., by 

30 providing the polypeptides encoded by the C-1027 gene cluster ORFs and the appropriate 
substrates and/or cofactors. 



-26- 




A) Synthesis of enedivnes and enediyne analogues. 

In one embodiment, this invention provides for the synthesis of C- 1027 
enediynes and/or C-1027 analogues or derivatives. In a preferred embodiment, this is 
accomplished by providing a cell comprising a C-1027 gene cluster and culturing the cell 
5 under conditions whereby the desired enediyne or enediyne analogue is synthesized. The 
cell can be a cell that does not normally synthesize an enediyne and the entire gene cluster 
can be transfected into the cell Alternatively, a cell that typically synthesizes enediynes can 
be utilized and all or part of the C-1027 gene cluster can be introduced into the cell. 

Enediyne derivatives/analogues can be produced by varying the order of, or 
10 kind of, gene cluster subunits present in the cell, and/or by changing the host cell (e.g. to a 
eukaryotic cell that glycosylates the biosynthetic product), and/or by providing altered 
metabolites (e.g. adding exogenous aglycones to a host that carries a gene cassette of the 
deoxysugar biosynthesis and glycosylation genes for the production of glycosylated 
metabolites), etc. 

1 5 In certain embodiments, the host cell need not be transfected with an entire C- 

1027 gene cluster. Rather, various components of a C-1027 gene cluster can be altered 
within a cell already harboring a C-1027 cluster. By varying or adding various biosynthetic 
open reading frames, C-1027 enediyne variants can be produced. 

The use of standard techniques of molecular biology (gene disruption, gene 

20 replacement, gene supplement) can be used to modulate and/or otherwise alter enediyne 
and/or other metabolite (e.g. shunt metabolite) production in an organism that naturally 
synthesizes an enediyne (e.g. S. globisporus) or an organism that is modified to synthesize an 
enediyne. 

In addition, or alternatively, control sequences that alter the expression of 
25 various open reading frames can be introduced that alter the amount and/or timing of 

enediyne production. Thus, for example, by placing particular C-1027 open reading frames 
under control of a constitutive promoter (ermE*) C-1027 production was increased by as 
much as 4-fold {see, e.g. Table 3 and Example 1). 

30 Table 3. Alteration of C-1027 production by engineering the C-1027 biosynthesis gene 
cluster. 

Strain Yield (%) 
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WT 100 

WT/pKC1139 100 

WT/erm£*/ORF 2 >150 

WT/ORF 9 >100 

WT/e7?n£*/ORF 9 <10 

WT/ORF 10, 11 >100 

WT/ermE*/ORF 10, 1 1 >1 00 

WT/ORF 9, 10, 11 >400 



ORF2: transmembrane eflux protein; ORF 9: CagA apoprotein; ORF 10: TDP-glucose 
synthase; ORF 11; Hydroxylase/halogenase 

Where enediyne analogues are synthesized, it will often prove desirable to 
5 assay them for biological activity. Such assays are well know to those of skill in the art. 
One such assay is illustrated in Example L Briefly, this example depicts an assay of 
antibacterial activity against M. luteus as described by Hu et al (1988) J. Antibiot 41: 1575- 
1579). Other suitable assays for enediyne activity will be known to those of skill in the art. 

B) Use of C-1027 open reading frames to synthesize an enediyne core. 

10 The C-1027 open reading frames described herein, or variants thereof, can be 

used to synthesize an enediyene core, e.g., from a fatty acid precursor. One such synthetic 
pathway is illustrated in Figure 4. This reaction scheme utilizes ORF 17 (epoxide hydrase), 
ORF 20 (monooxygenase), ORF 21 (iron-sulfur flavoprotein), ORF 29 (P-450 hydroxylase, 
ORF 30 (oxidoreductase), ORF 32 (oxidoreductase), ORF 35 (proline oxidase), and ORF 38 

15 (P-450 hydroxylase) to synthesize anenediyne core. 

This synthetic pathway, is not considered limiting, but merely illustrative. 
Using this as a model, one of ordinary skill in the art can design numerous other synthetic 
schemes to produce enediyne cores and/or core variants. 

d Use of C-1027 open reading frames to synthesize deoxy sugars. 

20 The biosynthesis of various deoxy sugars {e.g., deoxyhexoses) typically share 

a common key intermediate -4-keto-6-deoxyglucose nucleoside diphosphate or its analogs, 

whose formation from glucose nucleoside diphosphate is catalyzed by the NGDH enzyme, 

an NAD + -dependent oxidoreductase (Liu and Thorson (1994) Ann. Rev. Microbiol. 48: 223- 

256; Piepersberg (1997) pp. 81-163. In Biotechnology of antibiotics, 2nd ed. W. R. Strohl 

-28- 




(ed). Marcel Dekker, New York.). Similarly, the C-1027 gene cluster includes an NAGDH 
enzyme which can be exploited to synthesize a variety of deoxy sugars. 

One illustrative synthetic pathway is shown in Figure 2. This biosynthetic 
scheme utilizes ORF 10 (dNDP-glucose synthase), ORF 1 (dNDP-glucose dehydratase), 
5 ORF 12 (epimerase), ORF 8 (aminotransferase), ORF 6 (C-methyltransferase), ORF 7 (N- 
methyltransferase) and ORF 19 (glycosyl transferase). 

This synthetic pathway, is not considered limiting, but merely illustrative. 
Using this as a model, one of ordinary skill in the art can design numerous other synthetic 
schemes to produce various deoxy sugars. 

10 D) Use of C-1027 open reading frames to synthesize B-amino acids. 

In still another embodiment, C-1027 biosynthetic polypeptides can be used in 
the biosynthesis of p-amino acids. One illustrative synthetic pathway is shown in Figure 3 A. 
This biosynthetic scheme utilizes ORF 4 (hydroxylase), ORF 1 1 (hydroxylase/halogenase), 
ORF 24 (aminomutase), ORF 23 (type II NRPS condensation enzyme), ORF 25 (type II 
1 5 NRPS adenylation enzyme), and ORF 26 (type II peptidyl carrier protein). 

Again, this synthetic pathway, is not considered limiting, but merely 
illustrative. Using this as a model, one of ordinary skill in the art can design numerous other 
synthetic schemes to produce other beta amino acids. 

E) Use of C-1027 open reading frames to synthesize benzoxazolinates. 

20 The C-1027 open reading frames can also be used to synthesize a 

benzoxazolinate. One illustrative synthetic pathway is shown in Figure 3B. This 
biosynthetic scheme utilizes ORF 15 (anthranilate synthase I, ORF 16 (anthranilate synthase 
II), ORF 4 (phenol hydroxylase/chlorophenol-4-monooxygenase), ORF 11 
(Hydroxylase/Halogenase), ORF 28 (O-methyltransferase), ORF 3 (coenzyme F390 

25 synthetase, ORF 14 (coenzyme F390 synthetase), and ORF 13 (O-acyltransferase). Again, 
this synthetic pathway, is not considered limiting, but merely illustrative. Using this as a 
model, one of ordinary skill in the art can design numerous other synthetic schemes to 
produce other beta amino acids. 
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III. Generation of chemical diversity. 

In addition to the directed modification and/or biosynthesis of various 
metabolites as described above, the C-1027 biosynthetic gene cluster open reading frames 
can be utilized, by themselves or in combination with other biosynthetic subunits (e.g. NRPS 
5 and/or PKS modules and/or enzymatic domains of other PKS and/or NRPS systems) to 
produce a wide variety of compounds including, but not limited to various enediyne or 
enediyne derivatives, various polyketides, polypeptides, polyketide/polypeptide hybrids, 
various thiazoles, various sugars, various methylated polypeptides/polyketides, and the like. 

As with the directed production of various metabolites described above, such 
10 compounds can be produced, in vivo or in vitro, by catalytic biosynthesis, e.g., using large, 
enediyne cluster units and/or modular PKSs, NRPSs, and hybrid PKS/NRPS systems. In a 
preferred embodiment large combinatorial libraries of cells harboring various 
megasynthetases can be produced by the random or directed modification of particular 
pathways and then selected for the production of a molecule or molecules of interest. It will 
15 be appreciated that, in certain embodiments, such libraries of megasynthetases/modified 
pathways, can be used to generate large, complex combinatorial libraries of compounds 
which themselves can be screened for a desired activity. 

Such combinatorial libraries can be created by the deliberate 
modification/variation of selected biosynthetic pathways and/or by random/haphazard 
20 modification of such pathways. 

A) Directed eneineerine of novel synthetic pathways. 

In numerous embodiments of this invention, novel polyketides, polypeptides, 
and combinations thereof are created by modifying the entediyne gene cluster ORFs and/or 
known PKSs, and/or NRPSs so as to introduce variations into metabolites synthesized by the 

25 enzymes. Such variations may be introduced by design, for example to modify a known 

molecule in a specific way, e.g. by replacing a single monomeric unit within a polymer with 
another, thereby creating a derivative molecule of predicted structure. Such variations can 
also be made by adding one or more modules or enzymatic domains to a known PKS or 
NRPS or enediyne cluster, or by removing one or more module from a known PKS or 

30 NRPS. 

Using any of these methods, it is possible to introduce PKS domains, NRPS 

domains, and entediyne domains into a megasynthetase. Mutations can be made to the 
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native enediyne, and/or NRPS, and/or PKS subunit sequences and such mutants used in 
place of the native sequence, so long as the mutants are able to function with other subunits 
(domains) in the synthetic pathway. Such mutations can be made to the native sequences 
using conventional techniques such as by preparing synthetic oligonucleotides including the 
5 mutations and inserting the mutated sequence into the gene encoding a NRPS and/or PKS 
subunit using restriction endonuclease digestion, {see, e.g., Kunkel, (1985) Proc. Natl. Acad. 
Sci. USA 82: 448; Geisselsoder et al (1987) BioTechniques 5: 786). Alternatively, the 
mutations can be effected using a mismatched primer (generally 10-20 nucleotides in length) 
which hybridizes to the native nucleotide sequence (generally cDNA corresponding to the 

10 RNA sequence), at a temperature below the melting temperature of the mismatched duplex. 
The primer can be made specific by keeping primer length and base composition within 
relatively narrow limits and by keeping the mutant base centrally located (Zoller and Smith 
(1983) Meth, Enzymol. 100: 468). Primer extension is effected using DNA polymerase, the 
product cloned and clones containing the mutated DNA, derived by segregation of the primer 

15 extended strand, selected. Selection can be accomplished using the mutant primer as a 
hybridization probe. The technique is also applicable for generating multiple point 
mutations (see, e.g., Dalbie-McFarland et al (1982) Proc. Natl Acad. Sci USA 79:6409). 
PCR mutagenesis will also find use for effecting the desired mutations. 

J$) Random modification of enediyne pathways. 

20 In another embodiment, variations can be made randomly, for example by 

making a library of molecular variants (e.g. of a known enediyne) by randomly mutating one 
or more elements of the subject gene cluster or by randomly replacing one or more open 
reading frames in a gene cluster with one or more of alternative open reading frames. 

The various open reading frames can be combined into a single multi-modular 

25 enzyme, thereby dramatically increasing the number of possible combinations obtained using 
these methods. These combinations can be made using standard recombinant or nucleic acid 
amplification methods, for example by shuffling nucleic acid sequences encoding various 
modules or enzymatic domains to create novel arrangements of the sequences, analogous to 
DNA shuffling methods described in Crameri et al (1998) Nature 391: 288-291, and in U.S. 

30 Patents 5,605,793 and in 5,837,458. In addition, novel combinations can be made in vitro, 
for example by combinatorial synthetic methods. Novel molecules or molecule libraries, can 
be screened for any specific activity using standard methods. 
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Random mutagenesis of the nucleotide sequences obtained as described above 
can be accomplished by several different techniques known in the art, such as by altering 
sequences within restriction endonuclease sites, inserting an oligonucleotide linker randomly 
into a plasmid, by irradiation with X-rays or ultraviolet light, by incorporating incorrect 
5 nucleotides during in vitro DNA synthesis, by error-prone PCR mutagenesis, by preparing 
synthetic mutants or by damaging plasmid DNA in vitro with chemicals. Chemical mutagens 
include, for example, sodium bisulfite, nitrous acid, hydroxylamine, agents which damage or 
remove bases thereby preventing normal base-pairing such as hydrazine or formic acid, 
analogues of nucleotide precursors such as nitrosoguanidine, 5-bromouracil, 2-aminopurine, 

10 or acridine intercalating agents such as proflavine, acriflavine, quinacrine, and the like. 

Generally, plasmid DNA or DNA fragments are treated with chemicals, transformed into E. 
coli and propagated as a pool or library of mutant plasmids. 

Large populations of random enzyme variants can be constructed in vivo 
using "recombination-enhanced mutagenesis." This method employs two or more pools of, 

15 for example, 10 6 mutants each of the wild-type encoding nucleotide sequence that are 

generated using any convenient mutagenesis technique, described more fully above, and then 
inserted into cloning vectors. 

O Incorporation and/or modification of non-C-1027 cluster elements. 

In either the directed or random approaches, nucleic acids encoding novel 
20 combinations of gene cluster ORFs are introduced into a cell. In one embodiment, nucleic 
acids encoding one or more enediyne synthetic cluster ORFS and/or PKS and/or NRPS 
domains are introduced into a cell so as to replace one or more domains of an endogenous 
gene cluster within a cell. Endogenous gene replacement can be accomplished using 
standard methods, such as homologous recombination. Nucleic acids encoding an entire 
25 enediyne, enediyne ORF, PKS, NRPS, or combination thereof can also be introduced into a 
cell so as to enable the cell to produce the novel enzyme, and, consequently, synthesize the 
novel polymer. In a preferred embodiment, such nucleic acids are introduced into the cell 
optionally along with a number of additional genes, together called a 'gene cluster,' that 
influence the expression of the genes, survival of the expressing cells, etc. In a particularly 
30 preferred embodiment, such cells do not have any other enediyne and/or PKS- and/or NRPS- 
encoding genes or gene clusters, thereby allowing the straightforward isolation of the 
molecule(s) synthesized by the genes introduced into the cell. 
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Furthermore, the recombinant vector(s) can include genes from a single 
enediyne and/or PKS and/or NRPS gene cluster, or may comprise hybrid replacement PKS 
gene clusters with, e.g., a gene for one cluster replaced by the corresponding gene from 
another gene cluster. For example, it has been found that ACPs are readily interchangeable 
5 among different synthases without an effect on product structure. Furthermore, a given KR 
can recognize and reduce polyketide chains of different chain lengths. Accordingly, these 
genes are freely interchangeable in the constructs described herein. Thus, the replacement 
clusters of the present invention can be derived from any combination of PKS and/or NRPS 
gene sets that ultimately function to produce an identifiable polyketide. 

10 Examples of hybrid replacement clusters include, but are not limited to, 

clusters with genes derived from two or more of the act gene cluster, the whiE gene cluster, 
frenolicin (freri), granaticin (gra), tetracenomycin (tern), 6-methylsalicylic acid (6-msas), 
oxytetracycline (otc) 9 tetracycline (tet), erythromycin (ery\ griseusin (gris) 9 nanaomycin, 
medermycin, daunorubicin, tylosin, carbomycin, spiramycin, avermectin, monensin, 

15 nonactin, curamycin, rifamycin and candicidin synthase gene clusters, among others. (For a 
discussion of various PKSs, see, e.g., Hopwood and Sherman (1990) Ann. Rev. Genet. 24: 
37-66; O'Hagan (1991) The Polyketide Metabolites, Ellis Horwood Limited. 

A number of hybrid gene clusters have been constructed, having components 
derived from the actjren, tern, gris and gra gene clusters (see, e.g., U.S. Patent 5,712,146). 

20 Other hybrid gene clusters, as described above, can easily be produced and screened using 
the disclosure herein, for the production of identifiable polyketides, polypeptides or 
polyketide/polypeptide hybrids. 

Host cells (e.g. Streptomyces) can be transformed with one or more vectors, 
collectively encoding a functional PKS/NRPS set, or a cocktail comprising a random 

25 assortment of enediyne ORFs and/or PKS and/or NRPS genes, modules, active sites, or 

portions thereof The vector(s) can include native or hybrid combinations of enediyne ORFs, 
and/or PKS and/or NRPS subunits or cocktail components, or mutants thereof As explained 
above, the gene cluster need not correspond to the complete native gene cluster but need only 
encode the necessary enediyne ORFs and/or PKS and/or NRPS components to catalyze the 

30 production of the desired product(s). 
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IV. Variation of starter and/or extender units, and/or host cells. 

In addition to varying the nucleic acids comprising the subject gene cluster, 
variations in the products produced by the gene cluster(s) can be obtained by varying the the 
host cell, the starter units and/or the extender units. Thus, for example different fatty acids 
5 can be utilized in the enediyne synthetic pathway resulting in different enediyne variants. 
Similarly different intermediate metabolites can be provided (e.g. endogenously produced by 
the host cell, or produced by an introduced herterologous construct, and/or supplied from an 
exogenous source (e.g. the culture media)). Similarly, varying the host cell can vary the 
resulting product(s). For example, a gene cassette carrying the enediyne biosynthesis genes 
1 0 can be introduced into a deoxysugar-synthesizing host for the production of glycosylated 
enediyne metabolites. 

V. Use of C-1027 resistance genes. 

The antibiotic C-1027 and metabolites present in C-1027 biosynthesis are 
highly potent cytotoxins. Accordingly the biosynthesis of C-1027 is facilitated by the 

15 presence of one or more antibiotic (e.g. enediyne) resistance genes. Without being bound to 
a particular theory, it is believed that CagA and SgcB function cooperatively to provide 
resistance. It is believed that the C-1027 chromophore is first sequestered by binding to the 
preaproprotein CagA (ORF 9) to form a complex, which is then transported out of the cell by 
the efflux pump SgcB (ORF 2) and processed by removing the leader peptide to yield the 

20 chromoprotein. Other genes that appear to mediate resistance in the C- 1 027 biosynthesis 

gene cluster include a transmembrane transport protein (ORF 27), a Na + /H + transporter (ORF 
0), an ABC transporter (ORF -1, C-terminus), a glycerol phosphate transporter (ORF -2), and 
a UvrA-like protein (ORF -1, N-terminus) (see, e.g., Table 2). 

These ORFs and/or the polypeptides encoded by these ORFs can be utilized 

25 alone, or in combination with one or more other C-1027 ORFs to confer resistance to 

enediyne or enediyne metabolites on a cell. This is useful in a wide variety of contexts. For 
example, to increase production of enediynes. For example, it is believed that C-1027 
resistance could be a limiting factor at the onset of C-1027 production. Provision of an extra 
copy of the plasmid-born sgcB, and overexpression of sgcB under the control of the 

30 constitutive erniE* promoter resulted in increase of C-1027 production (see example 1). 

In a therapeutic context, it is sometimes desirable to confer resistance on 

certain vulnerable cells. Thus, for example, where an enediyne is used as a 
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chemotherapeutic, transfection of vulnerable, but healthy cells (e.g. liver cells remote from 
the tumor site, stem cells, etc.) with vector(s) expressing the resistance gene(s) permits 
administration of the enediyne at a higher dosage with fewer adverse effects to the organism. 
Such approaches have been taken using the multi-drug resistance gene (MDR1) expressing 
5 p-glycoprotein. 

In another embodiment vectors are provided containing one or more 
resistance genes of this invention under control of a constitutive and/or inducible promoter 
thereby providing a "ready-made" expression system suitable for the expression of an 
enediyne or enediyne metabolite at high concentration. 
10 It is also noted that the resistance genes are expected to confer resistance to 

compounds other than enediynes. The resistance genes are expected to confer resistance to 
essentially any cytotoxic compound that can act as a substrate for the resistance gene(s) of 
this invention. 

VI. Kits. 

15 In still another embodiment, this invention provides kits for practice of the 

methods described herein. In one preferred embodiment, the kits comprise one or more 
containers containing nucleic acids encoding one or more of the C-1027 biosynthesis gene 
cluster open reading frames. Certain kits may comprise vectors encoding the sgc gene 
cluster orfs and/or cells containing such vectors. The kits may optionally include any 

20 reagents and/or apparatus to facilitate practice of the methods described herein. Such 

reagents include, but are not limited to buffers, labels, labeled antibodies, bioreactors, cells, 
etc. 

In addition, the kits may include instructional materials containing directions 
{i.e., protocols) for the practice of the methods of this invention. Preferred instructional 

25 materials provide protocols utilizing the kit contents for creating or modifying C-1027 gene 
cluster and/or for synthesizing or modifying a molecule using one or more sgc gene cluster 
ORFs. While the instructional materials typically comprise written or printed materials they 
are not limited to such. Any medium capable of storing such instructions and 
communicating them to an end user is contemplated by this invention. Such media include, 

30 but are not limited to electronic storage media (e.g., magnetic discs, tapes, cartridges, chips), 
optical media (e.g., CD ROM), and the like. Such media may include addresses to internet 
sites that provide such instructional materials. 
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EXAMPLES 

The following examples are offered to illustrate, but not to limit the claimed 

invention. 

Example 1 

5 Genes for production of the enedivne antitumor antibiotic C-1027 in Strevtomyces 
zlobisvorus are clustered with the cagA gene that encodes the C-1027 apoprotein 

We have been studying the biosynthesis of C-1027 in Streptomyces 
globisporus C-1027 as a model for the enediyne family of antitumor antibiotics (Thorson et 
al. (1999) Bioorg. Chem., 27: 172-188). C-1027 consists of a non-peptidic chromophore and 

10 an apoprotein, CagA [also called C-1027AG (Otani et al (1991) Agri Biol Chem. 55: 407- 
417)]. The C-1027 chromophore is extremely unstable in the protein-free state, the structure 
of which was initially deduced from an inactive but more stable degradation product 
(Minami et al (1993) Tetrahedron Lett 34: 2633-2636) and subsequently confirmed by 
spectroscopic analysis of the natural product (Yoshida et al (1993) Tetrahedron Lett. 34: 

15 2637-2640) (Fig. 1). While the absolute stereochemistry of the deoxy sugar moiety was 
established by total synthesis (Iida et al (1993) Tetrahedron Lett 34: 4079-4082), the 8S, 
9S, 13S and 17 R configuration of the C-1027 chromophore were based only on computer 
modeling (Okuno et al (1994) J. Med. Chem. 37: 2266-2273). Although no biosynthetic 
study has been carried out specifically on C-1027, the polyketide origin of the enediyne 

20 cores has been implicated by feeding experiments with 13 C-labeled acetate for the 

neocarzinostatin chromophore A (Hensens et al (1989) J. Am. Chem. Soc. Ill: 3295-3299), 
dynemicin (Tokiwa et al (1992) J. Am. Chem. Soc. 1 14: 4107-41 10), and esperamicin (Lam 
et al (1993) Am. Chem. Soc. 115: 12340-12345); and deoxysugar biosynthesis has been 
well characterized in actinomycetes (Liu and Thorson (1994) Ann. Rev. Microbiol. 48: 223- 

25 256; Piepersberg (1997) pp. 81-163. In Biotechnology of antibiotics, 2nd ed. W. R. Strohl 
(ed). Marcel Dekker, New York). Given the structural similarity of C-1027 to the other 
enediyne cores and to deoxysugars found in other secondary metabolites, we decided to 
clone either a PKS or a deoxysugar biosynthesis gene as the first step of identifying the C- 
1027 gene cluster from S. globisporus. 

30 Furthermore, the CagA apoprotein of C-1027 has been isolated, its amino acid 

sequence has been determined, and the corresponding cagA gene has been cloned and 
sequenced (Otani et al (1991) Agri. Biol Chem. 55: 407-417; Sakata et al (1992) Biosci. 
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Biotech. Biochem. 56: 1592-1595). Since genes encoding secondary metabolite production 
in actinomycetes have invariably been found to be clustered in one region of the microbial 
chromosome (Hopwood (1997) Chem. Rev. 97: 2465-2497), we further reasoned that 
mapping the cagA gene with either a putative PKS gene, a deoxysugar biosynthesis gene, or 
5 both to the same region of the S. globisporus chromosome should be viewed as strong 

evidence supporting the proposition that the cloned genes constitute the C-1027 biosynthesis 
gene cluster. 

We report here the cloning and sequencing of two genes, sgcA (Streptomyces 
globisporus C-1027) and sgcB y that encode a dNDP-glucose 4,6-dehydratase (NGDH) and a 

10 transmembrane efflux protein, respectively. The sgcA,B locus is indeed clustered with the 
cagA gene, leading to the localization of a 75-kb gene cluster from S. globisporus. The 
involvement of the cloned gene cluster in C-1027 biosynthesis was demonstrated by 
disrupting the sgcA gene to generate C-1027-nonproducing mutants and by complementing 
the sgcA mutants in vivo to restore C-1027 production. Our results, together with similar 

15 effort in the Thorson laboratory on the calicheamicin gene cluster (Thorson et ah (1999) 
Bioorg. Chem., 27: 172-188), represent the first cloning of a gene cluster for enediyne 
antitumor antibiotic biosynthesis. 

Materials and methods. 

Bacterial strains and plasmids. 

20 Escherichia coli DH5a was used as a general host for routine subcloning 

(Sambrook et ah (1989) Molecular cloning, a laboratory manual. Cold Spring Harbor 
Laboratory, Cold Spring Harbor, NY). E. coli XL 1-Blue MR (Stratagene, La Jolla, CA) 
was used as the transduction host for cosmid library construction. E. coli SI 7-1 was used as 
the donor host for E. coli-S. globisporus conjugation (Mazodier et ah (1989) J. Bacterioh 

25 171: 3583-3585). Micrococcus luteus ATCC943 1 was used as the testing organism to assay 
the antibacterial activity of C-1027 (Hu et ah (1988) J. Antibiot. 41: 1575-1579). The 
pGEM-3zf, -5zf, and -7zf and pGEM-T vectors were from Promega (Madison, WI). S. 
globisporus strains and other plasmids in this study are listed in Table 3 

30 Table 3. Strains and plasmids. 

Strain or Relevant Characteristics 
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plasmid 

S. globisporus 

C-1027 Wild-type (Hu et al (1988) J. Antibiot 41: 1575-1579) 

AF40 Mutant resulted from acriflavine treatment of S. globisporus C-1027, C- 

1027-nonproducing (Mao et al. (1997) Chinese J. Biotechnol. 13: 195- 

199) 

AF44 Mutant resulted from acriflavine treatment of S. globisporus C-1027, C- 

1027-nonproducing (Mao et al, supra) 
AF67 Mutant resulted from acriflavine treatment of S. globisporus C-1027, C- 

1027-nonproducing (Mao et al, supra) 
SB1001 5gc^4-disrupted mutant resulted from integration of pBS1012 into S. 

globisporus C-1027 Apr R ? C-1027-nonproducing 
SB 1002 sg<^4-disrupted mutant resulted from integration of pBS1013 into S. 

globisporus C-1027 Apr R , C-1027-nonproducing 



Plasmids: 

pOJ446 E. coli-Streptomyces shuttle cosmid, Apr R (Bierman et al. (1992) Gene, 1 16: 43- 
pOJ260 E. coli vector, non-replicating in Streptomyces, Apr R (Bierman et al. supra) 
pKCl 139 E. coli-Streptomyces shuttle vector, rep TS , Apr 11 (Bierman et al supra) 
pWHM3 £. coli-Streptomyces shuttle vector, Th R (Vara et al ( 1 989) J. Bacteriol 111: 
5872-5881) 

pWHM79 erm E* promoter in pGEM-3zf (Shen and Hutchinson ( 1 996) Proc. Natl Acad. 

Set USA 93: 6600-6604) 
pBSlOOl 0.75-kb PCR product amplified from S. globisporus with type I PKS primers in pGEM- 

T 

pBS1002 0.55-kb PCR product amplified from S. globisporus with NGDH gene primers in 
pGEM-T 

pBS1003 0.73-kb PCR product amplified from pBS1005 with cagA primers in pGEM-T 

pBS 1004 pOJ446 S. globisporus genomic library cosmid 

pBS1005 pOJ446 iS. globisporus genomic library cosmid 

pBS 1006 pOJ446 S. globisporus genomic library cosmid 

pBS1007 3.0-kb BamHI fragment from pBS1005 in pGEM-3zf, sgcA, sgcB 

pBS 1008 4.0-kb BamHI fragment from pBS 1 005 in pGEM-3zf, cagA 

pBS1009 1.0-kb Kpnl truncated fragment of sgcA from pBS1007 in pGEM-3zf 

pBSlOlO 0.75-kb SaclVSphl internal fragment of sgcA from pBS1009 in pGEM-5zf 
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pBSlOl 1 0.75-kb SacllSphl internal fragment of sgcA from pBSlOlO in pGEM-3zf 
pBS 1012 0.75-kb EcoRI/Hin dill internal fragment of sgcA from pBS 1 0 1 0 in pO J260 
pBS1013 0.75-kb EcoBJ/HindUI internal fragment of sgcA from pBSlOlO in pKCl 139 
pBS1014 2.0-kb EcoRI/Sphl fragment from pBS1007 in the SmaVSphl sites of pWHM79, ermE* 
sgcA 

pBS1015 2.5-kb EcdRUHindm fragment from pBS1014 in pWHM3, ermE* sgcA 
pBS1016 Self-ligation of the 5.2-kb Kpn\ fragment from pBS1007 

pBS1017 0.45-kb EcoKl/Sacl fragment from pWHM79 in EcoRUSacI sites of pBS1016, ermE* 
sgcB 

pBS1018 2.5-kb EcoRI/Hindlll fragment from pBS1017 in pKCl 139, ermE* sgcB 



Biochemicals and chemicals. 

Ampicillin, apramycin, nalidixic acid, and thiostrepton were from Sigma (St. 
Louis, MO). Unless specified otherwise, restriction enzymes and other molecular biology 
reagents were from standard commercial sources. 

Media and culture conditions. 

E. coli strains carrying plasmids were grown in Luria-Bertani (LB) medium 
and were selected with appropriate antibiotics. S. globisporus strains were grown on ISP-4 
(Difco Laboratories, Detroit, MI) or R2YE at 28°C for sporulation and in TSB (Hopwood et 
al (1985) Genetic manipulation of Streptomyces: a laboratory manual. John Innes 
Foundation, Norwich, UK) supplemented with 5 mM MgCk and 0.5% glycine at 28°C, 250 
rpm for isolation of genomic DNA. For transformation, S. globisporus strains were grown in 
YEME (Hopwood et al, supra.) for preparation of protoplasts and on R2YE for protoplast 
regeneration. For conjugation, both the E. coli SI 7-1 donors and the S. globisporus 
recipients (upon germination in TSB) were prepared in LB, and donors/recipients were 
grown on either ISP-4 medium with 0.05% yeast extract and 0.1% tryptone or AS-1 medium 
(Baltz (1980) Dev. Ind. Microbiol 21: 43-54; Bierman et al (1992) Gene 116: 43-69) at 
30°C for isolation of exconjugants. 

For C-1027 production, S. globisporus strains were grown either on R2YE or 

ISP-4 agar medium at 28°C or in liquid medium by a two-stage fermentation. For liquid 

culture, the seed inoculum was prepared by inoculating 50 mL medium (consisting of 2% 

glycerol, 2% dextrin, 1% fish meal, 0.5% peptone, 0.2% (NH 4 ) 2 S0 45 and 0.2% CaC0 3 , pH 

7.0) with an aliquot of spore suspension, incubating at 28°C, 250 rpm for two days. To a 
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fresh 50 mL of the same medium was then added the seed culture (5%), and incubation 
continued at 28°C, 250 rpm for three to six days (Hu et al (1988) J. Antibiot 41: 1575- 
1579). The fermentation supernatants were harvested by centrifugation (Eppendorf 5415C, 
4°C, 10 min, 14,000 rpm) on day 3, 4 and 5, and assayed for their antibacterial activity 
5 against M. luteus (Hu et al (1988) J, Antibiot, 41: 1575-1579). 

DNA isolation and manipulation. 

Plasmid preparation and DNA extraction were carried out by using 
commercial kits (Qiagen, Santa Clarita, CA). Total S. globisporus DNA was isolated 
according to literature protocols (Hopwood et al (1985) Genetic manipulation of 

10 Streptomyces: a laboratory manual John Innes Foundation, Norwich, UK; Rao et al (1987) 
Methods Enzymol 153: 166-198). Restriction endonuclease digestion and ligation followed 
standard methods (Sambrook et al (1989) Molecular cloning, a laboratory manual Cold 
Spring Harbor Laboratory, Cold Spring Harbor, NY). For Southern analysis, digoxigenin 
labeling of DNA probes, hybridization, and detection were performed according to the 

15 protocols provided by the manufacturer (Boehringer Mannheim Biochemicals, Indianapolis, 
IN). 

DNA sequencing. 

Automated DNA sequencing was carried out on an ABI Prism 377 DNA 
Sequencer using the ABI Prism dye terminator cycle sequencing ready reaction kit and 
20 AmpliTag DNA polymerase FS (Perkin-Elmer/ABI, Foster City, CA). Sequencing service 
was provided by either the DBS Automated DNA Sequencing Facility, UC Davis, or Davis 
Sequencing Inc. (Davis, CA). Data were analyzed by ABI Prism Sequencing 2.1.1 software 
and the Genetics Computer Group program (Madison, WI). 

Polymerase chain reaction (PCR)» 

25 Primers were synthesized at the Protein Structure Laboratory, UC Davis, 

PCR was carried out on a Gene Amp PCR System 2400 (Perkin-Elmer/ABI) with Tag 
polymerase and buffer from Promega. A typical PCR mixture consisted of 5 ng of S. 
globisporus genomic or plasmid DNA as template, 25 pmoles of each primers, 25 yM dNTP, 
5% DMSO, 2 units of Taq polymerase, 1 x buffer, with or without 20% glycerol in a final 

30 volume of 50 \xL. The PCR temperature program was as follows: initial denaturing at 94°C 
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for 5 min, 24-36 cycles of 45 sec at 94°C, 1 min at 60°C, 2 min at 72°C, followed by 
additional 7 min at 72°C. 

For type II PKS, the following two pairs of degenerate primers were used — 
5'-AGC TCC ATC AAG TCS ATG RTC GG-3* (forward, SEQ ID NO:103) / 5 5 -CC GGT 
5 GTT SAC SGC GTA GAA CCA GGC G-3' (reverse, SEQ ID NO: 104) and 5'-GAC ACV 
GCN TGY TCB TCV-3' (forward, SEQ ID NO: 105)/5'-RTG SGC RTT VGT NCC RCT-3' 
(SEQ ID NO: 106) (B, C+G+T; N, A+C+G+T; R, A+G; S, C+G; V, A+C+G; Y, C+T) 
(reverse) (Seow et al (1997) J. Bacteriol, 179: 7360-7368). No product was amplified 
under all conditions tested. For type I PKS, the following pair of degenerate primers were 

10 used— 5'-GCS TCC CGS GAC CTG GGC TTC GAC TC-3' (forward, SEQ ID NO:107) / 
5'-AG SGA SGA SGA GCA GGC GGT STC SAC-3' (S, G+C) (reverse, SEQ ID NO: 108) 
(Kakavas et al (1997) J. BacterioL, 179: 7515-7522). A distinctive product with the 
predicted size of 0.75 kb was amplified in the presence of 20% glycerol and cloned into 
pGEM-T according to the protocol provided by the manufacturer (Promega) to yield 

15 pBSlOOl. 

For NGDH, the following pair of degenerate primers were used — 5'-CS GGS 
GSS GCS GGS TTC ATC GG-3' (forward, SEQ ID NO:109) / 5'-GG GWR CTG GYR 
SGG SCC GTA GTT G-3' (R, A+G; S, C+G; W, A+T; Y, C+T) (reverse, SEQ ID NO:l 10) 
(Decker, et al (1996) FEMS Lett, 141: 195-201). A distinctive product with the predicted 

20 size of 0.55 kb was amplified and cloned into pGEM-T to yield pBS1002. 

For cagA, the following pair of primers, flanking its coding region, were 
used— 5 '-AG GTG GAG GCG CTC ACC GAG-3 ' (forward, SEQ ID NO: 1 1 1)/5 '-G GGC 
GTC AGG CCG TAA GAA G-3' (reverse, SEQ ID NO: 112) (Sakata et al (1992) Bioscl 
Biotechnol Biochern., 56: 159201595). A distinctive product with the predicted size of 0.73 

25 kb was amplified from pBS 1005 and cloned into pGEM-T to yield pBS 1003. 

Genomic library construction and screening. 

S. globisporus genomic DNA was partially digested with Mbol to yield a 
smear around 60 kb, as monitored by electrophoresis on a 0.3% agarose gel. This sample 
was dephosphorylated upon treatment with shrimp alkaline phosphatase and ligated into the 
30 E. coli-Streptomyces shuttle vector pOJ446 (Bierman et al (1992) Gene 116: 43-69) that was 
prepared by digestion with Hpal, shrimp alkaline phosphatase treatment, and additional 
digestion with BamUl. The resulting ligation mixture was packaged with the Gigapack II 
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XL two-component packaging extract (Stratagene). The package mixture was transduced 
into E. coli XL 1-Blue MR. The transduced cells were spread onto LB plates containing 
apramycin (100 jag/mL) and incubated at 37°C overnight. The titer of the primary library 
was approximately 6,000 colony- forming units per ^g of DNA. Restriction enzyme analysis 
5 of twelve randomly selected cosmids confirmed that the average size of inserts was about 35 
to 45 kb (Rao et al. (1987) Meth. EnzymoL, 153: 166-198). 

To screen the genomic library, colonies from five LB plates containing 
apramycin (100 jag/mL, with approximately 2,000 colonies per plate) were transferred to 
nylon transfer membranes (Micro Separations, Inc., Westborough, MA) and screened by 

10 colony hybridization with the PCR-amplified 0.55-kb NGDH fragment from pBS1002 as a 
probe. The positive cosmid clones were re-screened by PCR with primers for NGDH and 
confirmed by Southern hybridization (Sambrook et al., supra.). Further restriction enzyme 
mapping and chromosomal walking of these overlapping cosmids led to the genetic 
localization of the 75-kb sgc gene cluster, as represented by pBS1004, pBS1005, and 

15 pBS1006 (Fig. 5A). A 3.0-kb BamRl fragment from pBS1005 that hybridized to the NGDH 
probe was cloned into the same sites of pGEM-3zf to yield pBS1007. Similarly, a 4.0-kb 
BamBl fragment from pBS1005 that hybridizes to the PCR-amplified 0.73-kb cagA probe 
from pBS1003 was cloned into the same sites of pGEM-3zf to yield pBS1008 (Fig. 5B). 

Generation of sgcA mutants by insert-directed homologous recombination in S. 
20 globisporus. 

A 1.0-kb Kpnl fragment from pBS1007, containing the C-terminal truncated 
sgcA, was subcloned into pGEM-3zf to yield pBS1009. An internal fragment of sgcA was 
moved sequentially as a 0.75-kb SaclVSphl fragment from pBS1009 into the same sites of 
pGEM-5zf to yield pBSlOlO and as a 0.75-kb SacllSphl fragment from pBSlOlO into the 
25 same sites of pGEM-3zf to yield pBS 101 1 . The latter plasmid was digested with EcoRI and 
Hindlll, and the resulting 0.75-kb EcoKL/HindttI fragment was cloned into the same sites of 
pOJ260 and pKCl 139 (Bierman et al. (1992) Gene, 116: 43-69 to yield pBS1012 and 
pB S 1 0 1 3 , respectively. 

Introduction of pBS1012 and pBS1013 into S. globisporus was carried out by 
30 either polyethyleneglycol (PEG)-mediated protoplast transformation (Hopwood et al (1985) 
Genetic manipulation of Streptomyces: a laboratory manual John Innes Foundation, 
Norwich, UK) or E. coli-S. globisporus conjugation (Bierman et al. (1992) Gene 116: 43-69; 
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Matsushima and Baltz (1996) Microbiology 142: 261-267; Matsushima et al (1994) Gene 
146: 39-45), methods for both of which were developed recently in our laboratory. In brief, 
for transformation, pBS1012 and pBS1013 were propagated in E. coli ET12567 (MacNeil et 
al (1992) Gene 111: 61-68), and the resulting double strand plasmid DNA was denatured by 
5 alkaline treatment (Ho and Chater (1997) J. Bacteriol 179: 122-127). The latter DNA (5 
nL) and 200 jiL of 25% PEG 1000 in P buffer (Hop wood et al supra) were sequentially 
added to 50 \xL of S. globisporus protoplasts (10 9 ) in P buffer. The resulting suspension was 
mixed immediately and spread on R2YE plates. After incubation at 28°C for 16 to 20 hrs, 
the plates were overlaid with soft R2YE (0.7% agar) containing apramycin (100 ng/mL, final 

10 concentration); incubation continued until colonies appeared (in 5 to 7 days). For 

conjugation, E. coli S17-l(pBS1012) or J?, coli SI 7-1 (pBS1013) was grown to an OD 600 of 
0.3 to 0.4. Cells from a 20-mL culture were pelleted by centrifligation, washed in LB, and 
resuspended in 2 mL of LB as the E. coli donors. S. globisporus spores (10 3 to 10 9 ) were 
washed, resuspended in TSB, and incubated at 50°C for 10 min to activate germination. 

15 After additional incubation at 37°C for 2 to 5 hrs, the spores were pelleted and resuspended 
in LB as the S. globisporus recipients. The donors (100 jliL) and recipients (100 ^L) were 
mixed and spread equally onto two modified ISP-4 or AS-1 plates supplemented freshly with 
10 mM MgCl 2 (see Media and culture conditions). The plates were incubated at 28°C for 16 
to 22 hrs. After removal of most of the E. coli SI 7-1 donors by washing the surface with 

20 sterile water, the plates were overlaid with 3 mL of soft LB (0.7% agar) containing nalidixic 
acid (50 ng/mL, final concentration) and apramycin (100 p,g/mL, final concentration) and 
incubated at 28°C until exconjugants appeared (in approximately 5 days). 

Unlike pBS1012, which is a Streptomyces non-replicating plasmid, pBS1013 
bears a temperature-sensitive Streptomyces replication origin (Bierman et al (1992) Gene 

25 116: 43-69; Muth et al (1989) Mol Gen. Genet 219: 341-348) that is unable to replicate at 
temperatures above 34°C (Table 3), while the S. globisporus wild-type strain grows normally 
up to 37°C. Thus, spores of S. globisporus (pBS1013), from either the transformants or the 
exconjugants, were spread onto R2YE plates containing apramycin (100 ng/mL). The plates 
were incubated directly at 37°C, and mutants, resulting from single crossover homologous 

30 recombination between pBS1013 and the S. globisporus chromosome, were readily obtained 
in 7 to 10 days. Alternatively, the plates were first incubated at 28°C for 2 days until 
pinpoint-size colonies became visible and then shifted to 37°C to continue incubation. 
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Mutants resulting from single crossover homologous recombination grew out of the original 
pinpoint-size colonies as easily distinguishable sectors in 7 to 10 days. 

Construction of the s2cA and sscB expression plasmids. 

pBS1007 was digested with EcdRI, and made blunt-ended by treatment with 
5 the Klenow fragment of DNA polymerase L Upon additional digestion with Sphl, the 

resulting 2.0-kb blunt-ended Sphl fragment containing the intact sgcA gene was cloned into 
the Small Sphl sites of pWHM79 (Shen et al (1996) Proc. Natl Acad. Set, USA, 93: 6600- 
6604) to yield pBS1014. The latter was digested with EcoBl and Hindlll, and the resulting 
2.5-kb EcoRIIHindlll fragment was cloned into the same sites of pWHM3 (Vara et al 

10 (1989) J. Bacteriol 171: 5872-5881) to yield pBS1015, in which the expression of sgcA is 
under the control of the ermE* promoter (Bibb et al (1994) Mol Microbiol 14: 533-545). 

Alternatively, pBS1007 was digested with Kpnl, removing most of the sgcA 
gene, and the 5.2-kb Kpnl fragment was recovered and self-ligated to yield pBS1016. The 
ermE* promoter was subcloned from pWHM79 (Shen et al (1996) Proc. Natl Acad. Set, 

15 USA, 93: 6600-6604) as a 0.45-kb EcoKUSacl fragment and cloned into the same sites of 
pBS1016 to yield pBS1017. The latter was digested with EcoRI and Hindlll, and the 
resulting 2.5-kb EcoSllHindlll fragment was cloned into the same sites of pKCl 139 to yield 
pBS1018, in which the expression of sgcB is under the control of the ermE* promoter. 

Determination of C-1027 production. 

20 The production of C-1027 was detected by assaying its antibacterial activity 

against M. luteus (Hu et al (1988) J. Antibiot. 41: 1575-1579). From liquid culture, 
fermentation supernant (180 ^iL) was added to stainless steel cylinders placed on LB plates 
pre-seeded with overnight M. luteus culture (0.01% vol/vol). From solid culture, a small 
square block (0.5 x 0.5 x 0.5 cm 3 ) of agar from either R2YE or ISP-4 medium was directly 

25 placed on M. luteus-seeded LB plates. The plates were incubated at 37°C for 24 hrs, and C- 
1027 production was estimated by measuring the size of inhibition zones. 

Nucleotide sequence accession number. 

The nucleotide sequence reported here has been deposited in the GenBank 
database with the accession number AF201913. 
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Results. 

No polyketide synthase gene was amplified by PCR from S. slobisvorus. 

On the assumption that the C-1027 enediyne core is of polyketide origin, the 
PCR approach was adopted to screen S. globisporus for any putative PKS genes, although it 
5 is far from certain a priori if the biosynthesis of the enediyne core invokes a PKS and, if so, 
whether the enediyne PKS will exhibit a type I or type II structural organization. PCR 
methods for cloning either type I or type II PKS genes have been developed, and these 
methods have proven to be very effective in cloning PKS genes from various polyketide- 
producing actinomycetes (Kakavas et ah (1997) J. BacterioL 179: 7515-7522; Seow et ah 

10 (1997) J. Bacterioh 179: 7360-7368). While no distinctive product was amplified under all 
conditions examined with both pairs of primers designed for type II PKS, a single product 
with the expected size of 0.75 kb was readily amplified by PCR from S. globisporus with 
primers designed for type I PKS, which was subsequently cloned (pBSlOOl). Intriguingly, 
sequence analysis of six randomly selected pBSlOOl clones yielded an identical product — 

15 indicative of a specific PCR amplification — the deduced amino acid sequence of which, 
however, showed no homology to known PKSs (data not shown), excluding the possibility 
of using PKS as a probe to identify the sgc biosynthesis gene cluster. 

Cloning of a putative NGDH gene by PCR from S. globisporus. 

The biosynthesis of various deoxyhexoses share a common key 
20 intermediate — 4-keto-6-deoxyglucose nucleoside diphosphate or its analogs — whose 
formation from glucose nucleoside diphosphate is catalyzed by the NGDH enzyme, an 
NAD + -dependent oxidoreductase (Liu and Thorson (1994) Ann. Rev. Microbiol. 48: 223- 
256; Piepersberg (1997) pp. 81-163. In Biotechnology of antibiotics, 2nd ed. W. R. Strohl 
(ed). Marcel Dekker, New York). The PCR method was adopted to clone the putative 
25 NGDH gene from S. globisporus with primers designed according to the homologous 

regions of various NGDH enzymes from actinomycetes (Decker et ah (1996) FEMSLett. 
141: 195-201), resulting in the amplification of a single product with the expected size of 
0.55 kb (pBS1002). Sequence analysis of pBS1002 confirmed its identity as a part of a 
putative NGDH gene. 

30 To clone the complete NGDH gene, an S. globisporus genomic library, 

constructed in the E. coli-Streptomyces shuttle vector pOJ446 (Bierman et ah (1992) Gene 
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116: 43-69; Rao et al (1987) Methods Enzymol 153: 166-198), was analyzed by Southern 
hybridization with the PCR-amplified 0.55-kb fragment from pBS1002 as a probe. Of the 
10,000 colonies screened, 36 positive colonies were identified, 9 of which were confirmed 
by PCR to harbor the DGDH gene. Restriction enzyme mapping showed that all of them 
5 contained a single 3.0-kb BamUl fragment hybridizing to the NGDH probe. Additional 

chromosomal walking from this locus eventually led to the localization of the 75-kb sgc gene 
cluster, covered by 18 overlapping cosmids as represented by pBS1004, pBS1005, and 
pBS1006 (Fig. 5 A). The 3.0-kb BamHl fragment was subcloned (pBS1007) (Fig. 5B), and 
its nucleotide (nt) sequence was determined. 

10 Analysis of the DNA sequences of the sgcA and sscB genes. 

Two complete open reading frames (ORFs) (sgcA and sgcB) were identified 
within the 3.0-kb BamHl fragment of pBS1007, the 3,035-nt sequence of which is shown in 
Figure 6. The sgcA gene most likely begins with an ATG at nt 101, preceded by a probable 
ribosome biding site (RBS), GGAGG, and ends with a TGA stop codon at nt 1099. SgcA 

15 should therefore encode a 332-amino acid protein with a molecular weight of 36,341 and an 
isoelectric point of 6.01. A Gapped-BLAST search showed that the deduced sgcA gene 
product is highly homologous to various putative and known NGDH enzymes from 
antibiotic-producing actinomycetes, including Gdh from the erythromycin biosynthesis gene 
cluster in Saccharopolyspora erythraea (64% identity and 70% similarity) (Linton et al 

20 (1995) Gene 153: 33-40), MtmE from the mithramycin biosynthesis gene cluster in 
Streptomyces argillaceus (64% identity and 68% similarity) (Lombo et al (1997) J. 
BacterioL 179: 3354-3357), and TylA2 from the tylosin biosynthesis gene cluster in 
Streptomyces fradiae (62% identity and 68% similarity) (Merson-Davies and Cundliffe 
(1994) Mot Microbiol 13: 349-355) (Fig. 7). A conserved sequence of 14 amino acid 

25 residues close to the N-termini can be easily identified in these proteins, which has been 
described as a pap fold with an NAD + -binding motif, GxGxxG, (Fig. 7, boxed), consistent 
with their biochemical role in deoxyhexose biosynthesis (Liu and Thorson (1994) Ann. Rev. 
Microbiol 48: 223-256; Piepersberg (1997) pp. 81-163. In Biotechnology of antibiotics, 2nd 
ed. W. R. Strohl (ed). Marcel Dekker, New York). The function of Gdh and MtmE as TDP- 

30 glucose 4,6-dehydratases, requiring NAD + as a cofactor, has been confirmed by an enzyme 
assay following expression of the gdh (Linton et al (1995) Gene 153: 33-40) and mtmE gene 
(Lombo et al (1997) J. Bacteriol 179: 3354-3357) in E. coli, respectively, and by 
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purification of the Gdh protein from Sacc. erythraea (Vara et al (1989) J. Bacteriol 171: 
5872-5881). From these data, it is reasonable to suggest that sgcA encodes the NGDH 
enzyme required for the biosynthesis of the 4,6-dideoxy-4-dimethylamino-5- 
methylrhamnose moiety of the C-1027 chromophore. 
5 Transcribed in the same direction as sgcA, the sgcB gene is located 43 nt 

downstream of sgcA. It should begin with a GTG at nt 1 143, preceded by a probable RBS, 
AGGAG, and end with a TGA at nt 2708 (Fig. 6). Correspondingly, sgcB should therefore 
encode a 521 -amino acid protein with a molecular weight of 52,952 and an isoelectric point 
of 4.64. Database comparison of the deduced sgcB product revealed that SgcB is closely 
10 related to a family of membrane efflux pumps, such as LfiA from Mycobacterium smegmatis 
(43% identity and 50% similarity, protein accession number AAC43550) (Takiff et al 

(1996) Proc. Natl Acad. Set USA 93: 362-366), OrfA from Streptomyces cinnamomeus 
(42% identity and 47% similarity, protein accession number AAB71209) (Sommer et al 

(1997) Appl Environ. Microbiol 63: 3553-3560), and RifiP from the rifamycin biosynthesis 
1 5 gene cluster in Amycolatopsis mediterranei (35% identity and 44% similarity, protein 

accession number AAC01725) Augus et al (1998) Chem. Biol 5: 69-79). These proteins are 
membrane-localized transporters involved in the transport of antibiotics (conferring 
resistance), sugars, and other substances. While direct evidence is lacking for RifP 
conferring rifamycin resistance in A. mediterranei by transporting it out of the cells (August 

20 et al (1998) Chem. Biol, 5: 68-79), it has been proven that LfrA employs the 

transmembrane proton gradient in an antiporter mode to drive the efflux of intracellular 
antibiotics, resulting in fluoroquinolone resistance inM smegmatis (Takiff et al (1996) 
Proc. Natl Acad. Scl USA 93: 362-366). On the basis of the high degree of amino acid 
sequence conservation, an equivalent role could be proposed for SgcB, conferring resistance 

25 by exporting C-1027 from S. globisporus. 

The cagA gene is clustered with the S2CA and sscB locus. 

To determine if cagA is clustered with the sgcA and sgcB locus, PCR primers 
were designed according to the flanking regions of cagA (Sakata et al (1992) Bioscl 
Biotech. Biochem. 56: 1592-1595). A single product with the predicted size of 0.73 kb was 
30 indeed amplified from several of the overlapping cosmids (which cover the 75-kb sgc 

cluster), including pBS1004 and pBS1005, the identity of which as cagA was confirmed by 
sequencing. Restriction enzyme mapping and Southern hybridization analysis localized 
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cagA to a single 4.0-kb BamRl fragment that is approximately 14 kb upstream of the sgcA,B 
locus (Fig. 5B). The 4.0-kb BamRl fragment was subcloned (pBS1008), and its nt sequence 
was determined, revealing the cagA gene along with two additional ORFs (data not shown) 
(Fig. 5). As reported earlier, cagA encodes a 142-amino acid protein that is processed by 
5 cleavage of a 32-amino acid lead peptide to yield the mature CagA apoprotein (Sakata et al 
(1992) BioscL Biotech. Biochem. 56: 1592-1595). 

Disruption of the sgcA gene in S. globisporus. 

To examine if the cloned sgc cluster encodes C-1027 biosynthesis, sgcA was 
insertionally disrupted by a single crossover homologous recombination event to generate C- 

10 1027-nonproducing mutant strains (Fig. 8 A). Two plasmids were used — pBS1012 (a 

pOJ260 derivative) and pBS1013 (a pKCl 139 derivative), each of which contain a 0.75-kb 
internal fragment from sgcA (Table 3). After introduction of pBS1012 into S. globisporus 
either by PEG-mediated protoplast transformation or E. coli-S. globisporus conjugation, 
transformants or exconjugants that were resistant to apramycin were isolated in all cases. 

15 Since pBS1012 is derived from the Streptomyces non-replicating plasmid of pOJ260, these 
isolates must have resulted from integration of pBS1012 into the S. globisporus chromosome 
by homologous recombination. Plasmid pBS1013 was similarly introduced into S. 
globisporus. However, since pBS1013 is derived from pKCl 139 that carries the 
temperature-sensitive Streptomyces replication origin from pSG5 and can replicate normally 

20 at 28°C (Muth et al (1989) Mol Gen. Genet 219: 341-348), these isolates were subjected to 
incubation at the non-permissive temperature of 37°C to eliminate free plasmids from the 
host cells. As expected, normal growth stopped except for the recombinants that continue to 
grow at 37°C 5 indicative of integration of pBS1013 into S. globisporus by homologous 
recombination. The apramycin-resistant S. globisporus SB 1001 and S. globisporus SB 1002 

25 strains were chosen as representatives of mutant strains with disrupted sgcA gene resulted 
from integration of pBS1012 and pBS1013, respectively. 

To confirm that targeted sgcA disruption has occurred by a single crossover 
homologous recombination event, Southern analysis of the DNA from the mutant strains was 
performed as exemplified for S. globisporus SB 1001 with either pOJ260 or the 0.75-kb 

30 Sacll/Kpnl internal fragment of sgcA from pBSlOlO as a probe. As shown in Fig. 8B, a 
distinctive band of the predicted size of 6.3 kb was detected with the pOJ260 vector as a 
probe in all mutant strains (lanes 2, 3, and 4); this band was absent from the wild-type strain 
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(lane 1). Complementarity, when using the 0.75-kb SaclllKpnl internal fragment of sgcA as 
a probe (Fig. 8C), the 3.0-kb band in the wild-type strain (lane 1) was split into two 
fragments with the size of 6.3 kb and 1 .0 kb in the mutant strains (lanes 2, 3, and 4), as 
would be expected for disruption of sgcA by a single crossover homologous recombination 
event. 

S. globisporus SB1001 and S. zlobisvorus SB1002 are C-1027-nonproducing 
mutants. 

No apparent difference in growth characteristics and morphologies between 
the wild-type S. globisporus and mutant S. globisporus SB 1001 and S. globisporus SB 1002 
strains was observed. While C-1027 production in the wild-type S. globisporus strain could 
be detected on day 3, peaked on day 5, and continued for a few more days, as judged by 
assaying the antibacterial activity of the culture supernant against M luteus (Hu et al (1988) 
J. AntibioL 41: 1575-1579), C-1027 production is completely abolished in the sgcA mutant 
strains S. globisporus SB 1001 and S. globisporus SB 1002 (Fig. 9A). The latter phenotype 
was identical to that of the AF40, AF44, and AF67 mutants, C-1027-nonproducing S. 
globisporus strains that have been characterized previously (Fig. 9A and 9C) (Mao, et al 
(1997) Chinese J. BiotechnoL 13: 195-199). 

In vivo complementation of S. globisporus SB1001. 

The ability of the wild-type sgcA gene to complement the disrupted sgcA gene 
was tested in the S. globisporus SB1001 strain. The construction of pBS1015, in which the 
expression of sgcA is under the control of the constitutive errnE* promoter, was described in 
Materials and Methods. Both the pBS1015 construct and the pWHM3 vector as a control 
were introduced by transformation into the S. globisporus SB 1001 mutant strains. Culture 
supernants from each transformant were bioassayed against M. luteus for C-1027 production. 
pBS1015 restored C-1027 production to 5. globisporus SB 1001 to the wild-type level; no C- 
1027 production was detected in the control in which pWHM3 was introduced into S. 
globisporus BS1001 (Fig. 9B and 9C). A significant reduction of C-1027 production was 
observed when S. globisporus SB1001(pBS1015) was cultured under identical conditions but 
without thiostrepton (Fig. 9B vs. 6C), indicative that pBS1015 may be unstable in S. 
globisporus SB 1001 in the absence of antibiotic selection pressure. 
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Expression of sgcB in S. globisporus. 

The effect of sgcB on C-1027 production was tested in the wild-type S. 
globisporus strain. The construction of pBS1018, in which the expression of sgcB is under 
the control of the constitutive ermE* promoter, was described in Materials and Methods. 
5 pBS1018 and the pKCl 139 vector as a control were each introduced by conjugation into S. 
globisporus. Culture supernatants from each exconjugant were harvested on days 3, 4, and 
5, and assayed for C-1027 production by determining the antibacterial activity against M 
luteus. While no apparent difference for C-1027 production was observed between the S. 
globisporus and S. globisporus (pKCl 139) strains, a significant increase in C-1027 
1 0 production ( 1 50±25%) was evident in the early stage of S. globisporus (pBS 1018) 

fermentation (Fig. 9D, day 3). However, such effect on C-1027 production leveled off as the 
fermentation proceeded and became insignificant when the culture reached the late stationary 
phase of fermentation (Fig. 9D, day 4 and 5). 

Discussion. 

15 Our inability to clone the putative enediyne PKS gene by PCR, with 

degenerate primers designed according to the highly conserved amino acid sequences of 
either type I or type II PKSs, or by DNA hybridization, with homologous type I or type II 
PKS as probes (data not shown), was unexpected, since feeding experiments by 
incorporation of [1- 13 C]- and [ 1, 2- 13 C] acetate into the enediyne cores of esperamicin (Lam 

20 et al (1993) J. Am. Chem. Soc. 115: 12340-12345), dynemicin (Tokiwa et al. (1992) J. Am. 
Chem. Soc. 1 14: 4107-41 10), and neocarzinostatin (Hensens et al (1989) /. Am. Chem. Soc. 
Ill: 3295-3299) supported their polyketide origin. Although the enediyne cores are 
structurally distinct from either the reduced or aromatic polyketides, the biosynthesis of 
which is well characterized by type I or type II PKS, respectively, it could be imagined that 

25 an enediyne PKS catalyzes the biosynthesis of a polyunsaturated linear heptaketide 

intermediate that is subsequently cyclized into the enediyne core structure (Hu et al. (1994) 
Mol Microbiol 14: 163-172; Spainke/a/. (1991) Nature 354: 125-130; Thorson etf 
(1999) Bioorg. Chem., 27: 172-188). Alternatively, Hensens and co-workers proposed a 
fatty acid origin for the enediyne core that was also consistent with the isotope labeling 

30 results. These authors suggested oleate as a precursor that is shortened by loss of carbons 

from both ends and is desaturated via the oleate-crepenynate pathway to furnish the enediyne 

core (Hensens et al. (1989) /. Am. Chem. Soc. Ill: 3295-3299). The latter pathway 
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resembles polyacetylene biosynthesis in higher plants and fungi and requires an acetylene 
forming enzyme — a plant gene encoding such an enzyme was identified recently (Lee et al 
(1998) Science 280: 915-918). Our DNA sequence analysis of approximately 60 kb of the 
sgc gene cluster, fails to reveal any gene that resembles PKS. 

Although little is known about the resistance mechanism for the enediyne 
antibiotics in general, the apoproteins of the chromoprotein type of enediynes could be 
viewed as resistance elements that confer self-resistance to the producing organisms by drug 
sequestration (Thorson et al (1999) Bioorg. Chern., 27: 172-188). Such a resistance 
mechanism is in fact well established in antibiotic-producing actinomycetes, for example, 
BlmA, the bleomycin-binding protein from Streptomyces verticillus (Shen et al (1999) 
Bioorg. Chem. 27: 155-171). Given the fact that antibiotic production genes have invariably 
been found to be clustered in one region of the microbial chromosome, consisting of 
structural, resistance, and regulatory genes, we adopted a strategy to clone the sgc gene 
cluster by mapping a putative C-1027 structural gene to the previously cloned cagA gene, 
considered as a resistance gene that encodes the C-1027 apoprotein. 

We chose NGDH as the putative C-1027 structural gene on the basis of the 
4,6-dideoxy-4-dimethylamino-5-methylrhamnose moiety of the C-1027 chromophore. It has 
been well established that all deoxyhexoses could be derived from the common intermediate 
of 4-keto-6-deoxy glucose nucleoside diphosphate, the biosynthesis of which from glucose 
nucleoside diphosphate is catalyzed by an NGDH enzyme. We cloned the NGDH gene from 
S. globisporus by PCR and used it as a probe to screen an S. globisporus genomic library, 
resulting in the isolation of the 75-kb sgc gene cluster. DNA sequence analysis of a 3.0-kb 
BarriHl fragment of the sgc cluster confirmed the presence of the NGDH protein, encoded by 
sgcA, along with sgcB that encodes a transmembrane efflux protein (Fig. 6). The cagA gene 
indeed resides approximately 14 kb upstream of sgcA (Fig. 5); DNA sequence analysis of a 
4.0-kb BamBl fragment confirmed the identity of cagA along with two additional ORFs 
(data not shown). These results underline once again the effectiveness of cloning natural 
product biosynthesis gene clusters by exploiting the clustering phenomenon between 
resistance and structural genes. 

The involvement of the cloned gene cluster in C-1027 biosynthesis was 

demonstrated by disrupting the sgcA gene to generate S. globisporus mutants, the ability of 

which to produce C-1027 was completely abolished (Fig. 9 A), and by complementing the 

sgcA mutants in vivo upon expression of sgcA in trans to restore C-1027 production (Fig. 9B 
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and 6C). These data unambiguously establish that sgcA is essential for C-1027 production, 

and thus support the conclusion that the cloned gene cluster encodes C-1027 biosynthesis. It 

should be pointed out that, although the sgcA mutants S. globisporus SB 1001 and S. 

globisporus SB 1002 were characterized as C-1027-nonproducing on the basis of the 

antibacterial assay alone (Fig. 9 A), this phenotype was identical to that of the controls of the 

AF40, AF44, and AF67 mutants (Fig. 9A and 9C). The latter strains were isolated 

previously upon randomly mutagenizing the wild-type S. globisporus strain with acriflavine 

and confirmed to be C-1027-nonproducing by both the antibacterial bioassay and an 

antitumor spermatogonial assay (Mao, et al (1997) Chinese 1 Biotechnol 13: 195-199), 

providing strong support to the current study. Gene disruption and complementation in S. 

globisporus were made possible by the recently developed genetic system that allowed us to 

introduce plasmid DNA into S. globisporus via either PEG-mediated protoplast 

transformation (Hopwood et al. (1985) Genetic manipulation ofStreptomyces: a laboratory 

manual John Innes Foundation, Norwich, UK) or E. coli-S. globisporus conjugation 

(Bierman et al (1992) Gene 116: 43-69; Matsushima and Baltz (1996) Microbiology 142: 

261-267; Matsushima et al (1994) Gene 146: 39-45) for analyzing the sgc biosynthesis gene 

cluster in vivo. Given the difficulties encountered with calicheamicin biosynthesis in 

Micromonospora echinospora, into which all attempts to introduce plasmid DNA have failed 

(Thorson et al (1999) Bioorg. Chern., 27: 172-188), the latter results underscore the 

importance of selecting C-1027 as a model system for enediyne biosynthesis so that many of 

the genetic tools developed in Streptomyces species can now be directly applied to the study 

of enediyne biosynthesis. 

Finally, the function of sgcB was probed by examining C-1027 production, 

following expression of the gene in the wild-type S. globisporus strain. Database 

comparison of the deduced amino acid sequence clearly suggested SgcB as a transmembrane 

efflux protein, conferring resistance by exporting C-1027 out of the cell. Hence, in addition 

to CagA, SgcB could be viewed as the second resistance element identified for C-1027 

biosynthesis. Multiple resistance genes have been identified in numerous antibiotic 

biosynthesis gene clusters (Hopwood (1997) Chem. Rev. 97: 2465-2497). It could be 

imagined that CagA and SgcB function cooperatively to provide resistance — the C-1027 

chromophore is first sequestered by binding to the preaproprotein CagA to form a complex, 

which is then transported out of the cell by the efflux pump SgcB and processed by removing 

the leader peptide to yield the chromoprotein, although we do not have any experimental 
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data to substantiate this speculation. Since it is known that yields for antibiotic production 
could be profoundly altered by the introduction of extra copies of regulatory, resistance, or 
structural genes into wild-type organisms (Hutchinson (1994) Bio/Technology 12: 375-380), 
we tested the effect of overexpressing sgcB in S. globisporus on C-1027 production. While 
5 no apparent adverse effect on C-1027 production was observed upon introduction of the 
pKCl 139 vector into S. globisporus (data not shown), a significant increase in C-1027 
production (150±25%) was observed in the early stage of S. globisporus (pBS1017) 
fermentation (Fig. 9D, day 3), supporting the predicted function for SgcB in C-1027 
biosynthesis. We propose that C-1027 resistance could be a limiting factor at the onset of C- 

10 1027 production, which is circumvented by the extra copy of the plasmid-born sgcB, and 
overexpression of sgcB under the control of the constitutive ermE* promoter results in 
increase of C-1027 production. However, as the S. globisporus (pBS1017) fermentation 
proceeds to its stationary phase, C-1027 resistance is no longer a limiting factor for overall 
C-1027 production, and the effect of extra copy of SgcB on C-1027 production consequently 

15 became insignificant (Fig. 9D, day 5). 

In conclusion, genetic analysis of enediyne biosynthesis has heretofore met 
with little success in spite of considerable effort (Thorson et al (1999) Bioorg. Chem,, 27: 
172-188). The localization of the sgc gene cluster and characterization of the sgcA and sgcB 
genes have now provided an excellent basis for genetic and biochemical investigations 

20 and/or modification of C-1027 biosynthesis, and gene disruption and overexpression in S. 
globisporus clearly demonstrated the potential to construct enediyne-overproducing strains 
and to produce novel enediynes that may have enhanced potency as novel anticancer drugs 
using combinatorial biosynthesis and targeted mutagenesis. We envisage that the results 
from C-1027 biosynthesis should facilitate the cloning and characterization of biosynthesis 

25 gene clusters of other enediyne antibiotics in Streptomyces as well as in other actinomycetes, 
and could have a great impact on the overall field of combinatorial biosynthesis. 
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It is understood that the examples and embodiments described herein are for 
illustrative purposes only and that various modifications or changes in light thereof will be 
suggested to persons skilled in the art and are to be included within the spirit and purview of 
5 this application and scope of the appended claims. All publications, patents, and patent 
applications cited herein are hereby incorporated by reference in their entirety for all 
purposes. 
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CLAIMS 



What is claimed is: 



1 . An isolated nucleic acid comprising a nucleic acid selected from the 
group consisting of 

5 a nucleic acid encoding any of C-1027 open reading frames (ORFs) -7 

through 42, excluding ORF 9 (cagA); 

a nucleic acid encoding a polypeptide encoded by any of C-1027 open 
reading frames (ORFs) -7 through 42, excluding ORF 9 (cagA); and 

a nucleic acid amplified by polymerase chain reaction (PCR) using 
10 primer pairs that amplify any of C-1027 open reading frames (ORFs) -7 through 42, 
excluding ORF 9 (cagA). 

2. The isolated nucleic acid of claim 1, wherein said nucleic comprises a 
nucleic acid encoding at least two open reading frames (ORFs) selected from the group 
consisting of ORF-1 through ORF 42, excluding ORF 9 (cagA). 

15 3. The isolated nucleic acid of claim 1, wherein said nucleic comprises a 

nucleic acid encoding at least three open reading frames (ORFs) selected from the group 
consisting of ORF-1 through ORF 42, excluding ORF 9 (cagA). 

4. An isolated nucleic acid comprising a nucleic acid that specifically 
hybridizes under stringent conditions to an open reading frame (ORF) of the C-1027 

20 biosynthesis gene cluster, excluding ORF 9 (cagA), and can substitute for the ORF to which 
it specifically hybridizes to direct the synthesis of an enediyne. 

5. The isolated nucleic acid of claim 4, wherein said isolated nucleic acid 
comprises a nucleic acid that specifically hybridizes under stringent conditions to a nucleic 
acid selected from the group consisting of ORF -7, ORF -6, ORF -5, ORF -4, ORF -3, ORF - 

25 2, ORF -1, ORF 0, ORF 1, ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 8, ORF 10, 
ORF 11, ORF 12, ORF 13, and ORF 14. 



6. The isolated nucleic acid of claim 4, wherein said isolated nucleic acid 
comprises a nucleic acid that specifically hybridizes under stringent conditions to a nucleic 
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acid selected from the group consisting of ORF 15, ORF 16, ORF 17, ORF 18, ORF 19, 
ORF 20, ORF 21, ORF 22, ORF 23, ORF 24, ORF 25, ORF 26, ORF 27, ORF 28, ORF 29, 
ORF 30, ORF 31, ORF 32, ORF 33, ORF 34, ORF 35, ORF 36, ORF 37, ORF 38, ORF 39, 
ORF 40, ORF 41, and ORF 42. 

7. The isolated nucleic acid of claim 5, wherein said isolated nucleic acid 
comprises a nucleic acid selected from the group consisting of ORF -7, ORF -6, ORF -5, 
ORF -4, ORF -3, ORF -2, ORF -1, ORF 0, ORF 1, ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, 
ORF 7, ORF 8, ORF 10, ORF 11, ORF 12, ORF 13, and ORF 14. 

8. The isolated nucleic acid of claim 6, wherein said isolated nucleic acid 
comprises a nucleic acid selected from the group consisting of ORF 15, ORF 16, ORF 17, 
ORF 18, ORF 19, ORF 20, ORF 21, ORF 22, ORF 23, ORF 24, ORF 25, ORF 26, ORF 27, 
ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33, ORF 34, ORF 35, ORF 36, ORF 37, 
ORF 38, ORF 39, ORF 40, ORF 41, and ORF 42. 

9. The isolated nucleic acid of claim 4, wherein said nucleic acid 
comprises a nucleic acid that is a single nucleotide polymorphism (SNP) of a nucleic acid 
selected from the group consisting of ORF -7, ORF -6, ORF -5, ORF -4, ORF -3, ORF -2, 
ORF -1, ORF 0, ORF 1, ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 81, ORF 1, 
ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 8, ORF 10, ORF 11, ORF 12, ORF 13, 
ORF 14, ORF 15, ORF 16, ORF 17, ORF 18, ORF 19, ORF 20, ORF 21, ORF 22, ORF 23, 
ORF 24, ORF 25, ORF 26, ORF 27, ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33, 
ORF 34, ORF 35, ORF 36, ORF 37, ORF 38, ORF 39, ORF 40, ORF 41, and ORF 42. 

10. An isolated gene cluster comprising open reading frames encoding 
polypeptides sufficient to direct the assembly of a C-1027 enediyne or a C-1027 enediyne 
analogue. 

1 1 . The gene cluster of claim 10, wherein said gene cluster is present in a 

bacterium, 

12. The gene cluster of claim 11, wherein said gene cluster is present in a 
bacterium selected from the group consisting of Actinomycetes, Actinoplanetes, 
Actinornadura, Micromonospora, and Streptomycetes. 

-56- 




13. The gene cluster of claim 11, wherein said gene cluster is present in a 
bacterium selected from the group consisting Streptomyces globisporus, Streptomyces 
lividans, Streptomyces coelicolor, Micromonospora echinospora spp. calichenisis, 
Actinomadura verrucosopora, Micromonospora chersina, Streptomyces carzinostaticus, and 

5 Actinomycete L585-6. 

14. The gene cluster of claim 13, wherein one or more open reading 
frames is operatively linked to a heterologous promoter. 

15. An isolated polypeptide comprising a catalytic domain encoded by a 
nucleic acid of a C-1027 gene cluster wherein said nucleic acid comprises a nucleic acid 

10 selected from the group consisting of 

a nucleic acid encoding any of C-1027 open reading frames (ORFs) -7 
through 42, excluding ORF 9 (cagA); and 

a nucleic acid amplified by polymerase chain reaction (PCR) using 
any one of the primer pairs identified in Tables I and II that specifically amplify one or more 
1 5 of (ORFs) -7 through 42, excluding ORF 9 (cagA). 

16. The polypeptide of claim 15, wherein said polypeptide is encoded by 
at least two open reading frames selected from the group consisting of C-1027 open reading 
frames (ORFs) -7 through 42, excluding ORF 9 (cagA). 

17. The polypeptide of claim 15, wherein said polypeptide is encoded by 
20 at least three open reading frames selected from the group consisting of C-1027 open reading 

frames (ORFs) -7 through 42, excluding ORF 9 (cagA). 

18. An expression vector comprising a nucleic acid of any one of claims 1 

through 9. 

19. A host cell transformed with an expression vector of claim 18. 

25 20. The host cell of claim 1 9, wherein said cell is transformed with an 

exogenous nucleic acid comprising a gene cluster encoding polypeptides sufficient to direct 
the assembly of a C-1027 enediyne or a C-1027 enediyne analogue. 



21. The host cell of claim 19, wherein said host cell is a bacterium. 
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22. The host cell of claim 21, wherein said bacterium is selected from the 
group consisting of Actinomycetes, Actinoplanetes, Actinomadura, Micromonospora, and 
Streptomycetes. 

23. The host cell of claim 21, wherein said bacterium is selected from the 
5 group consisting of Streptomyces globisporus, Streptomyces lividans, Streptomyces 

coelicolor, Micromonospora echinospora spp. calichenisis, Actinomadura verrucosopora, 
Micromonospora chersina, Streptomyces carzinostaticus, and Actinomycete L585-6. 



comprising contacting a biological molecule that is a substrate for a polypeptide encoded by 
10 a C-1027 biosynthesis gene cluster open reading frame, with a polypeptide encoded by a C- 
1 027 biosynthesis gene cluster open reading frame whereby said polypeptide chemically 
modifies said biological molecule. 



selected from the group consisting of a hydroxylase, a homocysteine synthase, a dNDP- 
15 glucose dehydrogenase, a citrate carrier protein, a C-methyl transferase, an N-methyl 
transferase, an aminotransferase, a CagA apoprotein, an NDP-glucose synthase, an 
epimerase, an acyl transferase, a coenzyme F390 synthase, and epoxidase hydrolase, an 
anthranilate synthase, a glycosyl transferase, a monooxygenase, a type II condensation 
protein, an aminomutase, a type II adenylation protein, an O-methyl transferase, a P-450 
20 hydroxylase, an oxidoreductase, and a proline oxidase. 

26. The method of claim 24, wherein said method comprising contacting 
said biological molecule with at least two different polypeptides encoded by C-1027 
biosynthesis gene cluster open reading frames. 

27. The method of claim 24, wherein said method comprising contacting 
25 said biological molecule with at least three different polypeptides encoded by C-1027 

biosynthesis gene cluster open reading frames. 



24. A method of chemically modifying a biological molecule, said method 



25. The method of claim 24, wherein said polypeptide is an enzyme 



28. The method of claim 24, wherein said contacting is in a host cell. 



29. 



The method of claim 28, wherein said host cell is a bacterium. 
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30. The method of claim 24, wherein said contacting ex vivo. 

3 1 . The method of claim 28, wherein said biological molecule is an 
endogenous metabolite produced by said host cell 

32. The method of claim 28, wherein said biological molecule is an 
5 exogenous supplied metabolite. 

33. The method of claim 28, wherein said host cell is a eukaryotic cell. 

34. The method of claim 33, wherein said eukaryotic cell is selected from 
the group consisting of a mammalian cell, a yeast cell, a plant cell, a fungal cell, and an 
insect cell. 

10 35. The method of claim 28, wherein said host cell synthesizes sugars and 

glycosylates the biological molecule. 

36. The method of claim 35, wherein said host cell synthesizes 

deoxy sugars. 

37. The method of claim 24, wherein said method further comprises 
15 contacting said biological molecule with a polyketide synthase or a non-ribosomal 

polypeptide synthetase. 

38. The method of claim of claim 24, wherein said contacting is in a 

bacterial cell. 

39. The method of claim of claim 24, wherein said contacting is ex vivo, 

20 40. The method of claim 24, wherein said method comprises contacting 

said biological molecule with at substantially all of the polypeptides encoded by C-1027 
biosynthesis gene cluster open reading frames and said method produces an enediyne or 
enediyne analogue. 

41. The method of claim 24, wherein said biological molecule is a fatty 
25 acid and said biological molecule is contacted with a C-1027 orf polyeptide selected from the 
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group consisting of an epoxide hydrase, a monooxygenase, an iron-sulfer flavoprotein, a p- 
450 hydroxylase, an oxidoreductase, and a proline oxidase. 

42. The method of claim 41, wherein said biological molecule is a fatty 
acid and said biological molecule is contacted with a plurality of C-1027 orf polypeptides 
comprising an epoxide hydrase, a monooxygenase, an iron-sulfer flavoprotein, a p-450 
hydroxylase, an oxidoreductase, and a proline oxidase. 

43. The method of claim 42, wherein said biological molecule is contacted 
with polypeptides encoded by ORF17, ORF20, ORF21, ORF29, ORF30, ORF32, ORF35, 
and ORF38. 

44. The method of claim 41, wherein said biological molecule is contacted 
with polypeptides encoded by ORF 15, ORF 16, ORF 28, ORF3, ORF 14, and ORF 13. 

45. The method of claim 44 wherein said biological molecule is also 
contacted with polypeptides encoded by ORF 4 and ORF 3. 

46. The method of claim 24, wherein said method comprises contacting a 
sugar with one or more C-1027 open reading frame polypeptides selected from the group 
consisting of a dNDP-glucose synthase, a dNDP glucose dehydratase, an epimerase, an 
aminotransferase, a C-methyltransferase, an N-methyltransferase, and a glycosyl transferase. 

47. The method of claim 46, wherein said method comprises contacting a 
dNDP-glucose with a plurality of C-1027 open reading frame polypeptides comprising a 
dNDP-glucose synthase, a dNDP glucose dehydratase, an epimerase, an aminotransferase, a 
C-methyltransferase, an N-methyltransferase, and a glycosyl transferase. 

48. The method of claim 24, wherein said method comprises contacting an 
amino acid with one or one or more C-1027 open reading frame polypeptides selected from 
the group consisting of a hydroxylase, an aminomutase, a type II NRPS condensation 
enzyme, a type II NRPS adenylation enzyme, and a type II peptidyl carrier protein. 

49. The method of claim 48, wherein said method comprises contacting an 
amino acid with a plurality of C-1027 open reading frame polypeptides comprising a 
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hydroxylase, a halogenase, an aminomutase, a type II NRPS condensation enzyme, a type II 
NRPS adenylation enzyme, and a type II peptidyl carrier protein. 



51. A method of synthesizing a chromaprotein type enediyne core, said 
5 method comprising contacting a fatty acid with one or more C-1027 orf polypeptides 

selected from the group consisting of an epoxide hydrase, a monooxygenase, an iron-sulfer 
flavoprotein, a p-450 hydroxylase, an oxidoreductase, and a proline oxidase. 

52. The method of claim 51, wherein said fatty acid is contacted with a 
plurality of C-1027 orf polypeptides comprising an epoxide hydrase, a monooxygenase, an 

10 iron-sulfer flavoprotein, a p-450 hydroxylase, an oxidoreductase, and a proline oxidase. 

53. The method of claim 52, wherein said fatty acid is contacted with 
polypeptides encoded by ORF17, ORF20, ORF21, ORF29, ORF30, ORF32, ORF35, and 
ORF38. 



15 contacting a sugar with one or more C-1027 open reading frame polypeptides selected from 
the group consisting of a dNDP-glucose synthase, a dNDP glucose dehydratase, an 
epimerase, an aminotransferase, a C-methyltransferase, an N-methyltransferase, and a 
glycosyl transferase. 

55. The method of claim 54, wherein said method comprises contacting a 
20 dNDP-glucose with a plurality of C-1027 open reading frame polypeptides comprising a 

dNDP-glucose synthase, a dNDP glucose dehydratase, an epimerase, an aminotransferase, a 
C-methyltransferase, an N-methyltransferase, and a glycosyl transferase. 

56. The method of claim 55, wherein said dNDP-glucose is contacted with 
polypeptides encoded by ORF17, ORF20, ORF21, ORF29, ORF30, ORF32, ORF35, and 

25 ORF38. 

57. A method of synthesizing a beta amino acid, said method comprising 
contacting an amino acid with one or one or more C-1027 open reading frame polypeptides 
selected from the group consisting of a hydroxylase, an aminomutase, a type II NRPS 



50. 



The method of claim 48, wherein said amino acid is a tyrosine. 



54. A method of synthesizing a deoxysugar, said method comprising 
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condensation enzyme, a type II NRPS adenylation enzyme, and a type II peptidyl carrier 
protein. 

58. The method of claim 57, wherein said method comprises contacting an 
amino acid with a plurality of C- 1027 open reading frame polypeptides comprising a 

5 hydroxylase, a halogenase, an aminomutase, a type II NRPS condensation enzyme, a type II 
NRPS adenylation enzyme, and a type II peptidyl carrier protein. 

59. The method of claim wherein said amino acid is contacted with 
polypeptides encoded by ORF 4, ORFT 1, ORF24, ORF23, ORF25, and ORF26. 

60. The method of claim 57, wherein said amino acid is a tyrosine. 

10 6 1 . A method of synthesizing an enediyne or an enediyne analogue said 

method comprising: 

culturing a cell comprising a recombinantly modified C-1027 gene 
cluster under conditions whereby said cell expresses said enediyne or enediyne analogue; 
and 

15 recovering said enediyne or enediyne analogue. 

62. The method of claim 61, wherein said gene cluster is present in a 

bacterium. 

63. The gene cluster of claim 62, wherein said gene cluster is present in a 
bacterium selected from the group consisting of Actinomycetes, Actinoplanetes, 

20 Actinomadura, Micrornonospora, and Streptomycetes. 

64. The gene cluster of claim 62, wherein said gene cluster is present in a 
bacterium selected from the group consisting Streptomyces globisporus, Streptomyces 
lividans, Streptomyces coelicolor, Micrornonospora echinospora spp. calichenisis, 
Actinomadura verrucosopora, Micrornonospora chersina, Streptomyces carzinostaticus, and 

25 Actinomycete L585-6. 

65. The method of claim 61, wherein said gene cluster is present in a 
eukaryotic cell. 
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66. The method of claim 65, wherein said eukaryotic cell is selected from 
the group consisting of a mammalian cell, a yeast cell, a plant cell, a fungal cell, and an 
insect cell. 

67. The method of claim 61, wherein said host cell synthesizes sugars and 
5 glycosylates said enediyne or enediyne analogue. 

68. The method of claim 67, wherein said host cell synthesizes 

deoxy sugars. 

69. A method of making a cell resistant to an enediyne or an enediyne 
metabolite, said method comprising expressing in said cell one or more isolated C-1027 open 

10 reading frame nucleic acids that encode a protein selected from the group consisting of a 

CagA apoprotein, a SgcB transmembrane efflux protein, a transmembrane transport protein, 
a Na+/H+ transporter, an ABC transport, a glycerol phosphate tranporter, and a UvrA-like 
protein. 

70. The method of claim 69, wherein said isolated C- 1 027 open reading 
15 frame nucleic acids are selected from the group consisting of ORF 9, ORF2, ORF 27, ORF 0, 

ORF 1 c-terminus, ORF 2, and ORF 1 N-terminus. 

71. The method of claim 69, wherein said cell is a bacterial cell. 
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GENE CLUSTER FOR PRODUCTION OF THE ENEDIYNE 
ANTITUMOR ANTIBIOTIC C-1027 

ABSTRACT OF THE DISCLOSURE 

5 This invention provides nucleic acid sequences and characterization of the 

gene cluster responsible for the biosynthesis of the enediyne C-1027 (produced by 
Streptomyces globisporus). Methods are provided for the biosynthesis of enediynes, 
enediyne analogs and other biological molecules. 

10 
15 
20 

FILE: c:\_docs\2500 ucott\128usl\2500.128wo0enediyne.apl.doc 
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ORF10: dNDP-glucose synthase, 355 aa 
ORF1: dNDP-glucose dehydratase, 332 aa 
ORF12: epimerase, 192 aa 
ORF8: aminotransferase, 410 aa 



ORF6: C-methyltransferase, 423 aa 
ORF7: N-methyltransferase, 244 aa 
ORF19: glycosyl transferase, 459 aa 



Fig. 2 
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ORF4: Hydroxylase, 527 aa ORF23: Type II NRPS condensation enzyme, 459 aj 

ORF11: Hydroxylase/halogenase, 492/494 aa ORF25: Type II NRPS adenylation enzyme, 716 aa 
ORF24: Aminomutase, 539 aa ORF26: Type II peptidyl carrier protein, 93 aa 



Fig. 3 A 
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R - enediyne core 



ORF15: Anthranilate synthase I, 493 aa ORF3: Coenzyme F390 synthetase, 463 aa 
ORF16: Anthranilate synthase II, 220 aa ORF14: Coenzyme F390 synthetase, 484 aa 
ORF28: Q-methyltransferase, 350 aa ORF13: O-acyltransferase, 378 aa 



Fig. 3B 
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Fatty acid 



ORF17: Epoxide hydrase 
ORF20: Monooxygenase 
ORF21: Iron-sulfur flavoprotein 
ORF29: P-450 hydroxylase 
ORF30: Oxidoreductase 
ORF32: Oxidoreductase 
ORF35: Proline oxidase 
ORF38: P-450 hydroxylase 



ORF13: O-acyltransferase, ORF19:Glycosyl transferase, ORF23: Type II NRPS condensation enzyme 



Fig. 4 
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PATENT APPLICATION DECLARATION 



(Attorney's Docket No.: 2500.125US2) 

Each of the Applicants named below hereby declares as follows: 
L My residence, post office address and country of citizenship given below 
are true and correct. 

2. I believe I am the original, first and joint inventor of the subject matter 
which is claimed and for which a patent is sought in the patent application entitled "GENE 
CLUSTERFORPRODUCTION OF THE ENEDIYNE ANTITUMOR ANTIBIOTIC C- 1 027, " 

Serial No. , filed January 5, 2000, and I have reviewed and understand the 

contents of the specification, including its claims. 

3. I acknowledge my duty to disclose to the Office all information known to 
me to be material to patentability of this application, in accordance with 37 C.F.R. Section 1.56, 
which is defined on the attached page. 

I further declare that all statements made herein of my own knowledge are true and 
that all statements made on information and belief are believed to be true; and further that these 
statements were made with the knowledge that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under Section 1001 of Title 18 of the United States 
Code, and that such willful false statements may jeopardize the validity of the application or any 
patent issuing thereon. 



Date: 



Ben Shen 
Residence and 1 842 Rushmore Lane 
Post Office Address: Davis, California 95616 

(Citizenship: People's Republic of China) 



Date: 



Wen Liu 

Residence and Institute of Medicinal Biotechnology 
Post Office Address: Tiantan, Beijing, 100005, China 

(Citizenship: Peoples Republic of China) 



Serial No.: SERIAL NO. 



-1- 



Date: 

Steven D. Christenson 
Residence and 1079 Monarch Lane 
Post Office Address: Davis, California 95616 

(Citizenship: United States) 



Date: 

Scott Standage 
Residence and 63 Tudor Road 
Post Office Address: Bornet, Herts, ENS 5NW, United Kingdom 
(Citizenship: United Kingdom) 



Serial No.: SERIAL NO. 
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Section 1.56 Duty to Disclose Information Material to Patentability. 

(a) A patent by its very nature is affected with a public interest. The public interest is 
best served, and the most effective patent examination occurs when, at the time an application is being 
examined, the Office is aware of and evaluates the teachings of all information material to patentability. Each 
individual associated with the filing and prosecution of a patent application has a duty of candor and good 
faith in dealing with the Office, which includes a duty to disclose to the Office all information known to that 
individual to be material to patentability as defined in this section. The duty to disclose information exists 
with respect to each pending claim until the claim is cancelled or withdrawn from consideration, or the 
application becomes abandoned. Information material to the patentability of a claim that is cancelled or 
withdrawn from consideration need not be submitted if the information is not material to the patentability of 
any claim remaining under consideration in the application. There is no duty to submit information which 
is not material to the patentability of any existing claim. The duty to disclose all information known to be 
material to patentability is deemed to be satisfied if all information known to be material to patentability of 
any claim issued in a patent was cited by the Office or submitted to the Office in the manner prescribed by 
§§ L97(b)-(d) and 1.98. However, no patent will be granted on an application in connection with which 
fraud on the Office was practiced or attempted or the duty of disclosure was violated through bad faith or 
intentional misconduct. The Office encourages applicants to carefully examine: 

(1) prior art cited in search reports of a foreign patent office in a counterpart 
application, and 

(2) the closest information over which individuals associated with the filing or 
prosecution of a patent application believe any pending claim patentably defines, to make sure that 
any material information contained therein is disclosed to the Office. 

(b) Under this section, information is material to patentability when it is not cumulative 
to information already of record or being made of record in the application, and 

( 1 ) It establishes, by itself or in combination with other information, a prima facie case 
of unpatentability of a claim; or 

(2) It refutes, or is inconsistent with, a position the applicant takes in: 

(i) Opposing an argument of unpatentability relied on by the Office, or 

(ii) Asserting an argument of patentability. 

A prima facie case of unpatentability is established when the information compels a conclusion that a claim 
is unpatentable under the preponderance of evidence, burden-of-proof standard, giving each term in the claim 
its broadest reasonable construction consistent with the specification, and before any consideration is given 
to evidence which may be submitted in an attempt to establish a contrary conclusion of patentability. 

(c) Individuals associated with the filing or prosecution of a patent application within 
the meaning of this section are: 

(1) Each inventor named in the application; 

(2) Each attorney or agent who prepares or prosecutes the application; and 

(3) Every other person who is substantively involved in the preparation or prosecution 
of the application and who is associated with the inventor, with the assignee or with anyone to whom 
there is an obligation to assign the application. 

(d) Individuals other than the attorney, agent or inventor may comply with this section 
by disclosing information to the attorney, agent, or inventor. 



GENE CLUSTER FOR PRODUCTION OF THE ENEDIYNE 
ANTITUMOR ANTIBIOTICC-1027 

SEQUENCE LISTING 

SEQ ID No. 1. C-1027 gene cluster DNA sequence from 1 to 42,000, ORF-(-7) to ORF- 
26 

GTCGACTCTAGAGGATCCCGGGTGCGGAGTAGGGGTTACGGACGAAGGAGGGGTGCCCGG 

1 + + + + + + 60 

CAGCTGAGATCTCCTAGGGCCCACGCCTCATCCCCAATGCCTGCTTCCTCCCCACGGGCC 
-7-* *LIGPASYPNRVFSPHG 

CGACGCCTGCGGCGAAGGGCGGTTCCTTGAGTTCGAGGCCGGTGGCGAGGACGACGTGGT 

61 + + + + + + 120 

GCTGCGGACGCCGCTTCCCGCCAAGGAACTCAAGCTCCGGCCACCGCTCCTGCTGCACCA 
-7 AVGAAFP PEKLELGTALVVH 

CCGCGTCGAGGATCTGCGTGTCGGGGAGCGGCCCAGGGCGCAGCCCCTCGGTCAGGTACG 

121 + + + + + + 180 

GGCGCAGCTCCTAGACGCACAGCCCCTCGCCGGGTCCCGCGTCGGGGAGCCAGTCCATGC 
-7 DADLIQTDPLPGPRLGETLY 

GGGTGAGGCCCCTGACGGTCACCTCGAAGCAGCGGTCGTGGGACCGGGCGTCGAGCGCCT 

181 + + + + + + 240 

CCCACTCCGGGGACTGCCAGTGGAGCTTCGTCGCCAGCACCCTGGCCCGCAGCTCGCGGA 
-7 PTLGRVTVEFCRDHSRADLA 

CCCCGTCCGCTTCCACAAGGACGACGCCGGGACAGGACTCCCGTGCGGCCTCGACCAGTC 

241 + + + + + + 300 

GGGGCAGGCGAAGGTGTTCCTGCTGCGGCCCTGTCCTGAGGGCACGCCGGAGCTGGTCAG 
-7 EGDAEVLVVGPCSERAAEVL 

GGGCGTCGAGGTAGTCCTGGAAGATGCGGCGGGGGGCGGGGCCCTGTTCGGTGAACTTCC 

301 + + + + + + 360 

CCCGCAGCTCCATCAGGACCTTCTACGCCGCCCCCCGCCCCGGGACAAGCCACTTGAAGG 
-7 RADLYDQFIRRPAPGQETFK 

ACGAAGCCCAGCGCCGGGGCCAGTCGCGCCGGTCGGCCTCCTGGTTGGCCCAGTTGATGA 

361 + + + + + + 420 

TGCTTCGGGTCGCGGCCCCGGTCAGCGCGGCCAGCCGGAGGACCAACCGGGTCAACTACT 
-7 WSAWRRPWDRRDAEQNAWNI 

AGTCGAGCACGTCCTCGCGGAACACCGACATCCTGCCGGCCTGGATATTGAAGACGTGGT 

421 + + + + + + 480 

TCAGCTCGTGCAGGAGCGCCTTGTGGCTGTAGGACGGCCGGACCTATAACTTCTGCACCA 
-7 FDLVDERFVSMRGAQINFVH 

CCCAGGGGTTGCCGTCACGGTGATAGGCGACGCCGGCCGAGCGGTAgGCGGCGCGCCGCT 

481 + + + + + + 540 

GGGTCCCCAACGGCAGTGCCACTATCCGCTGCGGCCGGCTCGCCATcCGCCGCGCGGCGA 
-7 DWPNGDRHYAVGASRYAARR 

CCAGGAGGACGACTTCCAGCGGTCTTCTCGCGAAATGAAGCAGGCGTATCGCGGTCGCCG 

541 + + + + + + 600 

GGTCCTCCTGCTGAAGGTCGCCAGAAGAGCGCTTTACTTCGTCCGCATAGCGCCAGCGGC 
-7 ELLVVELPRRAFHLLRIATA 

TGCCTGCCAGGCCCGCCCCTACGACCAGCACCCTGGGGCGCGCACCCGTCATGCCCATGA 

601 + + + + + + 660 

ACGGACGGTCCGGGCGGGGATGCTGGTCGTGGGACCCCGCGCGTGGGCAGTACGGGTACT 
-7-< TGALGAGVVLVRPRAGTMGM 



1 




AGCCTCCCCCGCTGACTCAGGGCGgCGCGTCGCGCGCTCCCGTCGGTGTCCTCGCTGACT 



661 + + + + + + 720 

TCGGAGGGGGCGACTGAGTCCCGCcGCGCAGCGCGCGAGGGCAGCCACAGGAGCGACTGA 

GGAAGTTCCCTGACCTGGCGTCAACTCCACTGATCCGTAAGGGGATCGCGGGAGTGGATA 

721 + + + + + + 780 

CCTTCAAGGGACTGGACCGCAGTTGAGGTGACTAGGCATTCCCCTAGCGCCCTCACCTAT 

CGGGTCAGGTCGTGCACGATCGTGGCACCAGACAGATCACCACGTCGATAGGCACTCGTG 

781 + + + + + + 840 

GCCCAGTCCAGCACGTGCTAGCACCGTGGTCTGTCTAGTGGTGCAGCTATCCGTGAGCAC 

AGCCGCGCCCGGGGCTCGACGGGGCGGGGCACCGGCAGGGGCGGCCGCGTGATCAGCCGG 

841 + + + + + + 900 

TCGGCGCGGGCCCCGAGCTGCCCCGCCCCGTGGCCGTCCCCGCCGGCGCACTAGTCGGCC 

AGCCTGTCCGGGGGCGTGCGTGCGGGGCGTCAGCTGTCGATGTCGGGAACGCCAGGGACG 

901 + + + + + + 960 

TCGGACAGGCCCCCGCACGCACGCCCCGCAGTCGACAGCTACAGCCCTTGCGGTCCCTGC 
6-* *SDIDPVGPV- 

TCGATCTCGGTGCGGGCGTAGTGGTTGAAGTAGTTGGTGTAGAGGTTCACGGCCACGTGG 

961 + + + + + + 1020 

AGCTAGAGCCACGCCCGCATCACCAACTTCATCAACCACATCTCCAAGTGCCGGTGCACC 
6 D I ETRAYHNFYNTYLNVAVH 

ACGAAGACCTCGGCGAGCTCGGTGTCCGTCCATCCCTGTGCCACGGCCGCGTTCCACGAG 

1021 4- + + + + + 1080 

TGCTTCTGGAGCCGCTCGAGCCACAGGCAGGTAGGGACACGGTGCCGGCGCAAGGTGCTC 
6 VFVEALETDTWGQAVAANWS 

GCGTCAGACGCCTCGCCCACTTCGCCGGCGATCTCCCTGGCCACCTGGACCAGTGCTTCG 

1081 + + + + + + 1140 

CGCAGTCTGCGGAGCGGGTGAAGCGGCCGCTAGAGGGACCGGTGGACCTGGTCACGAAGC 
6 ADSAEGVEGAIERAVQVLAE- 

AGCTTCACGTCGTCGCCGGGCGTCCCCCGGCGAATCGCCACGGTCTCCTCCAGCGTGAAA 

1141 + + + + + + 1200 

TCGAAGTGCAGCAGCGGCCCGCAGGGGGCCGCTTAGCGGTGCCAGAGGAGGTCGCACTTT 
6 LKVDDGPTGRRIAVTEELTF- 

CCCGCGACCTTCGCCGACACCGTGTGCGCCGCCTGGCAGTACGCGCACGCGTCGACCGCG 

1201 + + + + + + 1260 

GGGCGCTGGAAGCGGCTGTGGCACACGCGGCGGACCGTCATGCGCGTGCGCAGCTGGCGC 
6 GAVKASVTHAAQCYACADVA 

CCCACGGCGAGGGCGATCGCCTCGCGTGTGCGGGCGTCGAACGTTCCATGTTCGGCGACG 

1261 + + + + + + 1320 

GGGTGCCGCTCCCGCTAGCGGAGCGCACACGCCCGCAGCTTGCAAGGTACAAGCCGCTGC 
6 GVALAIAERTRADFTGHEAV 

GCTCCGGTGATCGCGGCGTAGGTTTCCAGGACCACGGGGGAATGGGCCATTCCCCCGTGG 

1321 + + + + + + 1380 

CGAGGCCACTAGCGCCGCATCCAAAGGTCCTGGTGCCCCCTTACCCGGTAAGGGGGCACC 
6 AGTIAAYTELVVPSHAMGGH- 

ATGTTGAGCACTCGCCCGAACCGCTTCTCCAGTCGGCGCAGGATGTCTCCGCCGGCTGCG 

1381 + + + + + + 1440 

TACAACTCGTGAGCGGGCTTGGCGAAGAGGTCAGCCGCGTCCTACAGAGGCGGCCGACGC 
6 INLVRGFRKELRRLIDGGAA- 

GGTGCGGTGTCGATGGTGTGGACGGGAATCCGCGGCATGGGAATGCCTCTCCTCGTAGTG 

1441 + + + + + + 1500 

CCACGCCACAGCTACCACACCAGCCCTTAGGCGCCGTACCCTTACGGAGAGGAGCATCAC 
6-< PATDITHVPIRPM 



2 



ATGGGAGTTCCTCGTCCCTCCAGTCTGCCCAAGCACCTCCCCCGGTGAGCTGTCCCGGCC 

1501 + + + + + + 1560 

TACCCTCAAGGAGCAGGGAGGTCAGACGGGTTCGTGGAGGGGGCCACTCGACAGGGCCGG 

GCCCTCCGGCCCCTTCTAGGCAGGTCGCCCGGTGGTGCGGCCCCAGGACGTCACCTCGCC 

1561 + + + + + + 1620 

CGGGAGGCCGGGGAAGATCCGTCCAGCGGGCCACCACGCCGGGGTCCTGCAGTGGAGCGG 

GCACCACCGGGAGCCCCGAGGGGCGAGGTCAGAGGCCGAGCACCTCCTCGGCCAGGGCGG 

1621 + + + + + + 1680 

CGTGGTGGCCCTCGGGGCTCCCCGCTCCAGTCTCCGGCTCGTGGAGGAGCCGGTCCCGCC 
-5-* * LGLVEEALA 

TGCCCCGAACACGGGCCTCGATCTTGGCGAAGGCCAGGTCGCGTGTGGTGGAGGTGTCGT 

1681 + + + + + + 1740 

ACGGGGCTTGTGCCCGGAGCTAGAACCGCTTCCGGTCCAGCGCACACCACCTCCACAGCA 
-5 TGRVRAE I KAFALDRTTSTD 

CGGCGAACGGGGAGAAGCCGCAGTCGTCGCAGGTTCCCAGTTGCTCGACGGGGATGTAGC 

1741 + + + + + + 1800 

GCCGCTTGCCCCTCTTCGGCGTCAGCAGCGTCCAAGGGTCAACGAGCTGCCCCTACATCG 
-5 DAFPSFGCDDCTGLQEVPIY 

GGGCGGCGAGCAGGATGCGGTCGCGTACCTGCTCGGGGGTCTCGACCACTGGGTCGATCG 

1801 + + + + + + 1860 

CCCGCCGCTCGTCCTACGCCAGCGCATGGACGAGCCCCCAGAGCTGGTGACCCAGCTAGC 
-5 RAALLIRDRVQEPTEVVPDI 

GGTCGGTCACCCCGAGGAAGACGCGGGCGGCAGGGGGCAGGTGGTCACGGACGATGCTCA 

1861 + + + + + + 1920 

CCAGCCAGTGGGGCTCCTTCTGCGCCCGCCGTCCCCCGTCCACCAGTGCCTGCTACGAGT 
-5 PDTVGLFVRAAPPLHDRVIS 

GGACCCGCTCGGGGTCCGCTTCGCCGGCCAGTTCGAGATAGAAGTTGCCCGCCTTGAGCT 

1921 + + 4- + + + 1980 

CCTGGGCGAGCCCCAGGCGAAGCGGCCGGTCAAGCTCTATCTTCAACGGGCGGAACTCGA 
-5 LVREPDAEGALELYFNGAKL 

GGAAGAGCTTGGGCAGCAGTTCGGCGTAGTCGATGTCGAGGCTGTGCGTGGAGTCCTGGT 

1981 + + + + + + 2040 

CCTTCTCGAACCCGTCGTCAAGCCGCATCAGCTACAGCTCCGACACGCACCTCAGGACCA 
-5 QFLKPLLEAYDIDLSHTSDQ 

CGCCGCCGGGGCAGGTGTGTACGCCGATGCGGGCGGTTTCCTCGGCGCTGAAGCGCCCCA 

2041 + + + + + + 2100 

GCGGCGGCCCCGTCCACACATGCGGCTACGCCCGCCAAAGGAGCCGCGACTTCGCGGGGT 
-5 DGGPCTHVGIRATEEASFRG 

GGACTTCGTTGTTGAGGGCGATGAAGTCGTCGAGGACGCCGCCGCTGGGGTCGAGCTTGA 

2101 + + + + + + 2160 

CCTGAAGCAACAACTCCCGCTACTTCAGCAGCTCCTGCGGCGGCGACCCCAGCTCGAACT 
-5 LVENNLAI FDDLVGGS PDLK 

GGGACAGCCGCCCCTCGGTGAAGTCGAGCTGGACCACGTGTGCCCCCGCGTCCAGGCAGC 

2161 + + + + + + 2220 

CCCTGTCGGCGGGGAGCCACTTCAGCTCGACCTGGTGCACACGGGGGCGCAGGTCCGTCG 
-5 LSLRGETFDLQVVHAGADLC 

CTCGGATGTCGGCTTCGGCCTCGTCGGCGAGGTCGCGCAGGAACTGCTCGCGGGGGTAGC 

2221 + + + + + + 2280 

GAGCCTACAGCCGAAGCCGGAGCAGCCGCTCCAGCGCGTCCTTGACGAGCGCCCCCATCG 
-5 GRIDAEAEDALDRLFQERPY 

CCTCGATGGGAGTGGCGGGGTAGAGGAGGCTGAGGGCGGAGGGTGCGATGACCGCCTGCT 
2281 + + + + + + 2340 
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GGAGCTACCCTCACCGCCCCATCTCCTCCGACTCCCGCCTCCCACGCTACTGGCGGACGA 
-5 GE I PTAPYLLSLAS PAIVAQ 

TCAGGGGGCGGTCCGTGAGCTGCCGTGCGGCGCGCAGATAGGTTTCGGCCCGCACCTGGT 

2341 + + + + + + 2400 

AGTCCCCCGCCAGGCACTCGACGGCACGCCGCGCGTCTATCCAAAGCCGGGCGTGGACCA 
-5 KLPRDTLQRAARLYTEARVQ 

AGCGGAAGGGCCCTTGGGTGATGCTGGGGAGCTGCCGGGTGTGCCCGTCTGCGAAGGGGA 

2401 + + + + + + 2460 

TCGCCTTCCCGGGAACCCACTACGACCCCTCGACGGCCCACACGGGCAGACGCTTCCCCT 
-5 YRFPGQTI SPLQRTHGDAFP 

TGACAGCGCCGTCGGGCGAGAGGGTGTCGAGGCCGGTCACGGGGTAGGTGGCGAAGCTCG 

2461 + + + + + + 2520 

ACTGTCGCGGCAGCCCGCTCTCCCACAGCTCCGGCCAGTGCCCCATCCACCGCTTCGAGC 
-5 IVAGDPSLTDLGTVPYTAFS 

GCTTGGACTGTTCACCGTCCACGAGGACGGGGCTGCCGACTCGTTCCAGTCGTGTCAGGG 

2521 + + + + + + 2580 

CGAACCTGACAAGTGGCAGGTGCTCCTGCCCCGACGGCTGAGCAAGGTCAGCACAGTCCC 
-5 PKSQEGDVLVPSGVRELRTL 

TGTCCGCGACGGCCTGTTCCTGCTGTTTGGCCAGGTCCGTGGCGTCCAGGGTTCCCTGGG 

2581 + + + + + + 2640 

ACAGGCGCTGCCGGACAAGGACGACAAACCGGTCCAGGCACCGCAGGTCCCAAGGGACCC 
-5 TDAVAQEQQKALDTADLTGQ 

CATGCGCGGCAAGGGCGTGCAGGAGTGTCGCGGAGCGCGGAAGGCTGCCGATCGGCTCAG 

2641 + + + + + + 2700 

GTACGCGCCGTTCCCGCACGTCCTCACAGCGCCTCGCGCCTTCCGACGGCTAGCCGAGTC 
-5 AHAALAHLLTASRPLSGIPE 

TGGCGATGGTCATGGCCGAAGAGTAGGGAAGAGGCTGGGTTTCGAACCACCGCAAAGCTT 

2701 + + + + + + 2760 

ACCGCTACCAGTACCGGCTTCTCATCCCTTCTCCGACCCAAAGCTTGGTGGCGTTTCGAA 
-5-< T A I T M - 

TGATTGCCGCTTTTTCAGGGGAAGTTGATGCGAAGTCGCCGAGCGGCGGAACGTGCTGAT 

2761 + + + + + + 2820 

ACTAACGGCGAAAAAGTCCCCTTCAACTACGCTTCAGCGGCTCGCCGCCTTGCACGACTA 

GTATGGGGGGCGGGAGGAGCCTGCGGGGTTCTAGGAGCCGGTCGCGGCCACGGTGGAGGA 

2821 + + + + + + 2880 

CATACCCCCCGCCCTCCTCGGACGCCCCAAGATCCTCGGCCAGCGCCGGTGCCACCTCCT 
-4-* *SGTAAVTSS- 

GGTGCCCAGCTGGGAGCGGGGGGTCTTTTCGCCGACGCGGTTGGGCTCGATGGTGCGGGG 

2881 + + + + + + 2940 

CCACGGGTCGACCCTCGCCCCCCAGAAAAGCGGCTGCGCCAACCCGAGCTACCACGCCCC 
-4 TGLQSRPTKEGVRNPEITRP- 

GTCGACGGCCTCTCCGGGGGCACCTTGCCGGTAGACGCCTTCGGGGTCGGAGTCCCGGTC 

2941 + + + + + + 3000 

CAGCTGCCGGAGAGGCCCCCGTGGAACGGCCATCTGCGGAAGCCCCAGCCTCAGGGCCAG 
-4 DVAEGPAGQRYVGE PDSDRD- 

ATGGGGGAGCAGGAAGAAGACCCGGCGCCGGTACAGACCGCTGTCCGGGTCCGCTTCGGC 

3001 + + + + + + 3060 

TACCCCCTCGTCCTTCTTCTGGGCCGCGGCCATGTCTGGCGACAGGCCCAGGCGAAGCCG 
-4 HPLLFFVRRRYLGSDPDAEA- 

GTCGGCCCCGAGTTCGATGTAGCCGATCATGCGGCCGTCGCGGGCGTAGCGCGGCTTGTT 

3061 + + + + + + 3120 

CAGCCGGGGCTCAAGCTACATCGGCTAGTACGCCGGCAGCGCCCGCATCGCGCCGAACAA 
-4 DAGLE IYGIMRGDRAYRPKN- 
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CTTGCGCCGGGGGGTCTTGTCCAGGGCCTGGCGGACGTAGTCGAGTCCCTCGGGATCTTC 

3121 + + + + + + 3180 

GAACGCGGCCCCCCAGAACAGGTCCCGGACCGCCTGCATCAGCTCAGGGAGCCCTAGAAG 
-4 KRRPTKDLAQRVYDLGEPDE- 

GAGCCACACGACCTTCGCCTCGTGAACGAGATCGCTGTCGGTCAGTAGCGAGCTCATGGC 

3181 + + + + + + 3240 

CTCGGTGTGCTGGAAGCGGAGCACTTGCTCTAGCGACAGCCAGTCATCGCTCGAGTACCG 
-4-< LWVVKAEHVLDSDTLLS S M - 

GGCGACCTCTCCTTCGTCGGCGTGCACCGGGTGGGGAAGCGGTGCCTGCGTGATGTGTGT 

3241 + + + + + + 3300 

CCGCTGGAGAGGAAGCAGCCGCACGTGGCCCACCCCTTCGCCACGGACGCACTACACACA 

TCGTCTGCGGCGGTGGGCCGCAGTGGTGCGGACCGCCCGTGGTGCCGGTTCTCGGCCAAA 

3301 + + + + + + 3360 

AGCAGACGCCGCCACCCGGCGTCACCACGCCTGGCGGGCACCACGGCCAAGAGCCGGTTT 

GCACGGGCAGGTACGTCCTGGGGCACTCACATCGTAGATGGGGTCCGCTTCCGCAGGGCA 

3361 + + + + + + 3420 

CGTGCCCGTCCATGCAGGACCCCGTGAGTGTAGCATCTACCCCAGGCGAAGGCGTCCCGT 

GTGCCTCCGGTCGGAGGACGTTCATTCGTCGGCTGCCAGAGCGAGGTTGGGGTAGAACTT 

3421 + + + + + + 3480 

CACGGAGGCCAGCCTCCTGCAAGTAAGCAGCCGACGGTCTCGCTCCAACCCCATCTTGAA 
-3-* *EDAALALNPYFK- 

CCGGCCGTTGGATTTGATCATGTCGGCAGGTGAGGCGAGGCCCACTTCCTGGCGGACCCG 

3481 + + + + + + 3540 

GGCCGGCAACCTAAACTAGTACAGCCGTCCACTCCGCTCCGGGTGAAGGACCGCCTGGGC 
-3 RGNS KIMDAPSALGVEQRVR- 

GGTGGCGAAGGCACGGGCGGTCCCGGGGCGGATGCCTTCACTGTGTGCGCACCAGGTGCT 

3541 + + + + + + 3600 

CCACCGCTTCCGTGCCCGCCAGGGCCCCGCCTACGGAAGTGACACACGCGTGGTCCACGA 
-3 TAFARATGPRIGESHACWTS- 

GTAGGACGTGTAGAGAAGGCCCTGTTCGACGCGTAGCTCGCTGTTCTCGGGGTCGTGGAG 

3601 + + + + + + 3660 

CATCCTGCACATCTCTTCCGGGACAAGCTGCGCATCGAGCGACAAGAGCCCCAGCACCTC 
-3 YSTYLLGQEVRLESNE PDHL- 

GCAGCACTCGGCGAGGAAGCGGCCGATGTGGTCCTCGGTGTTCGCGTATGCGCTGGTGGC 

3661 + + + + + + 3720 

CGTCGTGAGCCGCTCCTTCGCCGGCTACACCAGGAGCCACAAGCGCATACGCGACCACCG 
-3 CCEALFRG I HDE TNAYASTA- 

GATGCGGACCCGGTCGGGGCCGGCGAGTGTGTCGCGGGTGGCGAGGTAGCGGCGGGCCCC 

3721 + + + + + + 3780 

CTACGCCTGGGCCAGCCCCGGCCGCTCACACAGCGCCCACCGCTCCATCGCCGCCCGGGG 
-3 I RVRD PGALTDRTALYRRAG- 

TTCGGTGAGCCAGTGCAGGATCCCGGGGCCCTCGTCCTGGACGAGTTCGACAGCCAGGTT 

3781 + + + + + + 3840 

AAGCCACTCGGTCACGTCCTAGGGCCCCGGGAGCAGGACCTGCTCAAGCTGTCGGTCCAA 
-3 ETLWHLIGPGEDQVLEVALN- 

GTCGATCTTGCGTTCGTCGGGGACGATCCGTTCGAAGGGCAGGAGGCGGATGCGGCGCCA 

3841 + + + + + + 3900 

CAGCTAGAACGCAAGCAGCCCCTGCTAGGCAAGCTTCCCGTCCTCCGCCTACGCCGCGGT 
-3 DIKREDPVIREFPLLRIRRW- 

GAAGGCGAAGCCGCCGGTGGAGACCTCGGGGCGGTGGTTGCCCAGCAGCCACAGCTTGTG 

3901 + + + + + + 3960 

CTTCCGCTTCGGCGGCCACCTCTGGAGCCCCGCCACCAACGGGTCGTCGGTGTCGAACAC 
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-3 FAFGGTSVE PRHNGLLWLKH- 

CGTGGGTGTGAAGGAGAAATAGTCCTGCCGCATGCGGCGGGCCTTGATCTTGTCACCGCC 

396 1 + + + + + + 4020 

GCACCCACACTTCCTCTTTATCAGGACGGCGTACGCCGCCCGGAACTAGAACAGTGGCGG 
-3 TPTFS FYDQRMRRAKI KDGG- 

GGTCAGCAGGCGGACGCGCGCCTCGTCGAAGCGGTCGTTGGGCTTGAGCTCGCTGCACAC 

4021 + + + + + + 4080 

CCAGTCGTCCGCCTGCGCGCGGAGCAGCTTCGCCAGCAACCCGAACTCGAGCGACGTGTG 
-3 TLLRVRAEDFRDNPKLESCV- 

GATGAGGCGGCGGCCGTGGAGTTCGGTGAGCTCGGTGGAGTGTTCGGAGTATGCGCCACG 

4081 + + + + + + 4140 

CTACTCCGCCGCCGGCACCTCAAGCCACTCGAGCCACCTCACAAGCCTCATACGCGGTGC 

-3 I LRRGHLETLETSHESYAGR- 

GTCCATGAGGAAACCCGGCGGGGCTGCGTCGGCGTAGTCGCCGAGAATCTGGATCATCAC 

4141 + + + + + + 4200 

CAGGTACTCCTTTGGGCCGCCCCGACGCAGCCGCATCAGCGGCTCTTAGACCTAGTAGTG 
-3 DMLFGPPAADAYDGLIQIMV- 

GTCGAGGAGAACGGATTTGCCGTTCTTTCCCTGGCCGTGGAGAAAGGGCAGCACCTGCGC 

4201 + + + + + + 4260 

CAGCTCCTCTTGCCTAAACGGCAAGAAAGGGACCGGCACCTCTTTCCCGTCGTGGACGCG 
-3 DLLVS KGNKGQGHLF P LVQA- 

CCCGACGTCACCGGTGATGGAGTAGCCGAGAAGGAGGTGGAGGAAGTCGATCATCTCCCG 

4261 + + + + + + 4320 

GGGCTGCAGTGGCCACTACCTCATCGGCTCTTCCTCCACCTCCTTCAGCTAGTAGAGGGC 
-3 GVDGTI SYGLLLHLFDIMER- 

CCCTTCGGCGTCACTGCCGAAGGTGTCTTCGAGGAAACGGTGCCAGCGGGGGGTGGGGAT 

4321 + + + + + + 4380 

GGGAAGCCGCAGTGACGGCTTCCACAGAAGCTCCTTTGCCACGGTCGCCCCCCACCCCTA 
-3 GEADSGFTDELFRHWRPTPI- 

GTCCTGGGGGGAGGCGCTGGTGGCGCGGGAGTGGAAGTCCCGGGTGGGGTCGGGCTTGCG 

4381 + + + + + + 4440 

CAGGACCCCCCTCCGCGACCACCGCGCCCTCACCTTCAGGGCCCACCCCAGCCCGAACGC 
-3 DQ P SASTARS HFDRT PD PKR- 

CATACGGCCGTTGCGGAGGTCGACCACTCCGTCAGGGGTGCACAGGGCGTAGGGGTCTCC 

4441 + + + + + + 4500 

GTATGCCGGCAACGCCTCCAGCTGGTGAGGCAGTCCCCACGTGTCCCGCATCCCCAGAGG 
-3 MRGNRLDVVGDPTCLAYPDG- 

GTCGAGGGTGTCGGGATCGAGGGAGAGGTCGGGAGAGGCCTTTGCCTGGGTGAGGAGCGC 

4501 + + + + + + 4560 

CAGCTCCCACAGCCCTAGCTCCCTCTCCAGCCCTCTCCGGAAACGGACCCACTCCTCGCG 
-3 DLTD PDL S LD P SAKAQTLLA- 

CTTCATACCGGTCGTCGACAGGGTGCGGCGTTTGTGGTGGTGCAGTTCCCGGTCGGTGAA 

4561 + + + + + + 4620 

GAAGTATGGCCAGCAGCTGTCCCACGCCGCAAACACCACCACGTCAAGGGCCAGCCACTT 
-3 KMGTTSLTRRKHHHLERDTF- 

CAGCCCGCGGGGATCGCTGCCGGGCATCTCCTCCGCCATCTCTCCGGCAGCCCACAGGGC 

4621 + + + + + + 4680 

GTCGGGCGCCCCTAGCGACGGCCCGTAGAGGAGGCGGTAGAGAGGCCGTCGGGTGTCCCG 
-3 LGRPDSGPMEEAMEGAAWLA- 

AGCTTTCTCGCCTCCGGCCCGCTTCCACCGGTAGCCGTCCCAGGAGTACCAGCCCAGGCC 

4681 + + + + + + 4740 

TCGAAAGAGCGGAGGCCGGGCGAAGGTGGCCATCGGCAGGGTCCTCATGGTCGGGTCCGG 
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3 AKEGGARKWRYGDWSYWGLG- 

CTCCACGTGCCGGAACTGGTCACGGTAGAGACGGACGAAGAGCTTGGCGTTGCCGCGGTC 

4741 + + + + + + 4800 

GAGGTGCACGGCCTTGACCAGTGCCATCTCTGCCTGCTTCTCGAACCGCAACGGCGCCAG 
3 EVHRFQDRYLRVFLKANGRD- 

GGTCAGGCTGGCGGGAATCTCGCCCGCCTCCCAGGCGGTCGCGGCGACGGGGGCCTCGGG 

4801 + + + + + + 4860 

CCAGTCCGACCGCCCTTAGAGCGGGCGGAGGGTCCGCCAGCGCCGCTGCCCCCGGAGCCC 
3 TLSAP I EGAEWATAAVPAEP- 

AGCGGCCTGGACAGGGAGGAGCGGCGCTGGGGCCGGGGTGGTTTCGAGGGCCAGCATCTG 

4861 + + + + + + 4920 

TCGCCGGACCTGTCCCTCCTCGCCGCGACCCCGGCCCCACCAAAGCTCCCGGTCGTAGAC 
3 AAQVPLLPAPAPTTELALMQ- 

CTGAGCGGCGGCAGTTGCGTCAAAGCGAGGGCCCTCGGCGCTGCTGCTCATGGACGTCCT 

4921 + + + + + + 4980 

GACTCGCCGCCGTCAACGCAGTTTCGCTCCCGGGAGCCGCGACGACGAGTACCTGCAGGA 
3-< QAAATADFRPGEAS S S M - 

TCGAGATGGAGCGGTCGGGCGGTCCCCGCTGCGGGAACGGCATGAATGATCTTCCCGGTG 

4981 + + + + + + 5040 

AGCTCTACCTCGCCAGCCCGCCAGGGGCGACGCCCTTGCCGTACTTACTAGAAGGGCCAC 

CGGACAGAGTGCCAGGGGCAGCGCATGTGCGGGGGGACAACGGCCCGTTTCGGACGAGGG 

5041 + + + + + + 5100 

GCCTGTCTCACGGTCCCCGTCGCGTACACGCCCCCCTGTTGCCGGGCAAAGCCTGCTCCC 

CCGGCCGACGGGGGGAAGCAGGGGCCGGCAACCGGGTGGCGGGGCGGCGTGAGCGAGGGC 

5101 + + + + + + 5160 

GGCCGGCTGCCCCCCTTCGTCCCCGGCCGTTGGCCCACCGCCCCGCCGCACTCGCTCCCG 

ACGAGCGGCCCGGTACGGGGGGAAGGGCTCGTCTCTCCGTGGGGCGGCACGTTGTGGTCC 

5161 + + + + + + 5220 

TGCTCGCCGGGCCATGCCCCCCTTCCCGAGCAGAGAGGCACCCCGCCGTGCAACACCAGG 

TCGTCCGTCAGCTTGCGTCTGGCTTCAGCCTCCTGACCCCCAATAAGGCGAAAGCTGCTG 

5221 + + + + + + 5280 

AGCAGGCAGTCGAACGCAGACCGAAGTCGGAGGACTGGGGGTTATTCCGCTTTCGACGAC 

GTCAAGCATCTTTCGTGACACTCGGCGAGGGACTGAAGGGACTGTCTTTCGGAATGAGTG 

5281 + + + + + + 5340 

CAGTTCGTAGAAAGCACTGTGAGCCGCTCCCTGACTTCCCTGACAGAAAGCCTTACTCAC 

TAGGGGGTTGTCGGGTGGGGACCGCGCCTCGACTCCCCGGCGGACGGGATCTGTTCGGTC 

5341 + + + + + + 5400 

ATCCCCCAACAGCCCACCCCTGGCGCGGAGCTGAGGGGCCGCCTGCCCTAGACAAGCCAG 

GGTCCCTTGGGTCCCTCCCCGGATCGCGGCAGGGACCCAAGGGGGCGGTGCGGCGGGCGG 

5401 + + + + + + 5460 

CCAGGGAACCCAGGGAGGGGCCTAGCGCCGTCCCTGGGTTCCCCCGCCACGCCGCCCGCC 

TCGGTGAGGGGCCCCGGTGGAGGGACTGAGGGTCTGTATGGAGCGATAAGAGGGTCTGAA 

5461 + + + + + + 5520 

AGCCACTCCCCGGGGCCACCTCCCTGACTCCCAGACATACCTCGCTATTCTCCCAGACTT 

GGGGCGGAGAGAGTTTCGGTCCCTGCGTTGAGTCCCTGGTCATCACCGCAGGTCAGAGGG 

5521 + + + + + + 5580 

CCCCGCCTCTCTCAAAGCCAGGGACGCAACTCAGGGACCAGTAGTGGCGTCCAGTCTCCC 

GTTTTGAGGGGTGAAAAAGGGACTGAAGGGACTCAACTTCCCCATTATGAGCTGAGTAGA 

5581 + + + + + + 5640 

CAAAACTCCCCACTTTTTCCCTGACTTCCCTGAGTTGAAGGGGTAATACTCGACTCATCT 
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AGAAAGCAGTATGACGATATCGGCGCCTACATACGCGCGCGTACATAGTGAGCTTATAAT 



5641 + + + + + + 5700 

TCTTTCGTCATACTGCTATAGCCGCGGATGTATGCGCGCGCATGTATCACTCGAATATTA 

GCGGAAGTTGAGTCCCTTCAGTCCCTTTTCGTGGGGTCGTATCCCCTCTGACTGCGTTGA 

5701 + + + + + + 5760 

CGCCTTCAACTCAGGGAAGTCAGGGAAAAGCACCCCAGCATAGGGGAGACTGACGCAACT 

CCGTCGCCGCTCCGCGCAGGGACCGAAGAGGGACCAAGTCCCTGCGCGGGGCGGGCGACG 

5761 + + + + + + 5820 

GGCAGCGGCGAGGCGCGTCCCTGGCTTCTCCCTGGTTCAGGGACGCGCCCCGCCCGCTGC 

GTAATCGTGCAGTGCCCCCTCCCCCGTTTCCCACAGCGAGTCGTCGCTCCCCTGTGAGGC 

5821 + + + + + + 5880 

CATTAGCACGTCACGGGGGAGGGGGCAAAGGGTGTCGCTCAGCAGCGAGGGGACACTCCG 

CGGAGAGGGTCCTAGAACCCCTCAGGGGCCGTTCTGTGGCCCTCTGGGCCTCCTCCTGGC 

5881 + + + + + + 5940 

GCCTCTCCCAGGATCTTGGGGAGTCCCCGGCAAGACACCGGGAGACCCGGAGGAGGACCG 

CATTTACCCCATGGGGGCGCTTGGGGGCGTCAGGAGGGCTTGTGAGGGCTCTGCCGGGAA 

5941 + + + + + + 6000 

GTAAATGGGGTACCCCCGCGAACCCCCGCAGTCCTCCCGAACACTCCCGAGACGGCCCTT 
_ 2 -> MRALPGS- 

GTGGCGGATTGCGCATGGCAGGAGATGCCCCGACAGCGGCCGGGAATCGACGATGTCCCC 

600 1 + + + + + + 6060 

CACCGCCTAACGCGTACCGTCCTCTACGGGGCTGTCGCCGGCCCTTAGCTGCTACAGGGG 
-2 GGLRMAGDAPTAAGNRRCPP- 

CGACCCCTATCCAGCGTCCGCTGATCCTCAGGAGGCAGACCTTGCAGGCTCCAGAAGCGA 

60 61 + + + + + + 6120 

GCTGGGGATAGGTCGCAGGCGACTAGGAGTCCTCCGTCTGGAACGTCCGAGGTCTTCGCT 
-2 TP I QRPL ILRRQTLQAPEAK- 

AGAACGGCCGGTCCCCGGAGCAGCCGCAGGAAGAGCGGATCGTCCTGGACGTATGGCTGG 

6121 + + + + + + 6180 

TCTTGCCGGCCAGGGGCCTCGTCGGCGTCCTTCTCGCCTAGCAGGACCTGCATACCGACC 
-2 NGRSPEQPQEERIVLDVWLA- 

CGAACTACCCGTTCCCCACCTATGACGGGCGTGACTTCCTCGCTCCGCTGCGCGAGCGGG 

6181 + + + + + + 6240 

GCTTGATGGGCAAGGGGTGGATACTGCCCGCACTGAAGGAGCGAGGCGACGCGCTCGCCC 
-2 NYPFPTYDGRDFLAPLRERA- 

CGGCGGAGTTCGAGCGCGCCCACCCCCGATACCGGGTCGACATCAACGGCCACGACTTCT 

6241 + + + + + + 6300 

GCCGCCTCAAGCTCGCGCGGGTGGGGGCTATGGCCCAGCTGTAGTTGCCGGTGCTGAAGA 
-2 AEFERAHPRYRVDINGHDFW- 

GGACCATCCCCGAGAAGGTGGCGCGCGCCACCGCGGAGGGCAGGCCTCCGCACATAGCGG 

6301 + + + + + + 6360 

CCTGGTAGGGGCTCTTCCACCGCGCGCGGTGGCGCCTCCCGTCCGGAGGCGTGTATCGCC 
-2 TIPEKVARATAEGRPPHIAG- 

GCTACTACGCCACCGACAGCCAGTTGGCGCGGGACGCGCGCAGGCCCGACGGGAAGCCGG 

6361 + + + + + + 6420 

CGATGATGCGGTGGCTGTCGGTCAACCGCGCCCTGCGCGCGTCCGGGCTGCCCTTCGGCC 
-2 YYATDS QLARDARR PDGKPV- 

TCTTCACCTCGGTGGAGGCCGCGTTGGCCGGCCGGACGGAGATACTGGGACACCCGGTGG 

6421 + + + + + + 6480 

AGAAGTGGAGCCACCTCCGGCGCAACCGGCCGGCCTGCCTCTATGACCCTGTGGGCCACC 
-2 FT SVEAALAGRTE I LGH PVV- 



TGGTGGAGGACCTCGACCCCGTGGTGCGCGACTCCTACTCGTTCGGGGGCGAGTTGGTGT 
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6481 + + + + + + 6540 

ACCACCTCCTGGAGCTGGGGCACCACGCGCTGAGGATGAGCAAGCCCCCGCTCAACCACA 
-2 VEDLDPVVRDSYS FGGELVS- 

CGCTGCCGCTCACGGTCACCACCATGCTCTGCTACGCCAACTCCTCCCTCCTCGCGCGCG 

6541 + + + + + + 6600 

GCGACGGCGAGTGCCAGTGGTGGTACGAGACGATGCGGTTGAGGAGGGAGGAGCGCGCGC 
-2 LPLTVTTMLCYANS SLLARA- 

CCGGTGTTCCGGAGTTGCCCCGTACCTGGGATGAGGTCGAAGCAGCCTGCCAGGCGGTGG 

6601 + + + + + + 6660 

GGCCACAAGGCCTCAACGGGGCATGGACCCTACTCCAGCTTCGTCGGACGGTCCGCCACC 
-2 GVPELPRTWDEVEAACQAVA- 

CCAGCGTCGACGGGGGGCCCGGTCACGGAATCACCTGGGCCAACGACGGCTGGGTTTTCC 

6661 + + + + + + 6720 

GGTCGCAGCTGCCCCCCGGGCCAGTGCCTTAGTGGACCCGGTTGCTGCCGACCCAAAAGG 
-2 S VDGG PGHG I TWANDGWV F Q - 

AGCAGGCCGTCGCCCTTCAGAACGGGGTGCTGACCGATCAGGACAACGGCCGCTCCGGCT 

6721 + + + + + + 6780 

TCGTCCGGCAGCGGGAAGTCTTGCCCCACGACTGGCTAGTCCTGTTGCCGGCGAGGCCGA 
-2 QAVALQNGVLTDQDNGRS G S - 

CCGCCACGACGGTGGACGTCACATCGGACGAGATGCTGGACTGGGTCCGCTGGTGGACGC 

6781 + + + + + + 6840 

GGCGGTGCTGCCACCTGCAGTGTAGCCTGCTCTACGACCTGACCCAGGCGACCACCTGCG 
-2 ATTVDVTSDEMLDWVRWWTH- 

ACCTCCATGAGCGCGGCCATTACCTCTACACGGGCGGGCCCTCGGACTGGGGCGGGGCGT 

6841 + + + + + + 6900 

TGGAGGTACTCGCGCCGGTAATGGAGATGTGCCCGCCCGGGAGCCTGACCCCGCCCCGCA 
-2 LHERGHYLYTGG P SDWGGAF- 

TCGAGGCTTTCGTCCAGCAGAAGGTCGCATTCACCTTCGACTCGTCCAAGGCCGCCCGGG 

6901 + + + + + + 6960 

AGCTCCGAAAGCAGGTCGTCTTCCAGCGTAAGTGGAAGCTGAGCAGGTTCCGGCGGGCCC 
-2 EAFVQQKVAFTFDS S KAARE- 

AACTCATCCAGGCCGGTGCACAGGCCGGTTTCGAGGTCGCGGTGTTCCCGTTGCCCAGGA 

6961 + + + + + + 7020 

TTGAGTAGGTCCGGCCACGTGTCCGGCCAAAGCTCCAGCGCCACAAGGGCAACGGGTCCT 
-2 L I QAGAQAGFEVAVF PL PRN- 

ACGCGAAGGCCCCGGTAGCGGGCCAGCCCGTCTCGGGAGACTCCCTGTGGCTGGCCGCGG 

7021 + + + + + + 7080 

TGCGCTTCCGGGGCCATCGCCCGGTCGGGCAGAGCCCTCTGAGGGACACCGACCGGCGCC 
-2 AKAPVAGQPVSGD S LWLAAG- 

GACTCGACGAGACCACGCAGGACGGGCTGCTCGCTCTCACCCAGTACCTGATCAGCCCGG 

7081 + + + + + + 7140 

CTGAGCTGCTCTGGTGCGTCCTGCCCGACGAGCGAGAGTGGGTCATGGACTAGTCGGGCC 
-2 LDETTQDGLLALTQYLISPA- 

CCAACGCCGCGGACTGGCACCGCACCAACGGTTTCGTACCGGTGACCGGCGCGGCCGGGG 

7141 + + + + + + 7200 

GGTTGCGGCGCCTGACCGTGGCGTGGTTGCCAAAGCATGGCCACTGGCCGCGCCGGCCCC 
-2 NAADWHRTNG FVPVTGAAGE- 

AACTGCTGGAAGCGACAGGCTGGTTCGACCGCCGGCCGCAGCAACGGGTGGCCGGGGAGC 

7201 + + + + + + 7260 

TTGACGACCTTCGCTGTCCGACCAAGCTGGCGGCCGGCGTCGTTGCCCACCGGCCCCTCG 
-2 LLEATGWFDRRPQQRVAGEQ- 

AGTTGAAGGCGTCCGACCGGTCACCGGCGGCGCTCGGCGCGCTGCTCGGCGACTTCGCGG 
7261 + + + + + + 7320 
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TCAACTTCCGCAGGCTGGCCAGTGGCCGCCGCGAGCCGCGCGACGAGCCGCTGAAGCGCC 
-2 LKASDRS PAALGALLGDFAA- 

CCGTCAACGAGGTCATCACCGCAGCGATGGACGATGTCCTGCGCAGTGGAGCGGACCCCG 

7321 + + + + + + 7380 

GGCAGTTGCTCCAGTAGTGGCGTCGCTACCTGCTACAGGACGCGTCACCTCGCCTGGGGC 
-2 VNEVI TAAMDDVLRSGAD P A - 

CGAAGGCCTTCGCCGAAGCCGGCGTGGCCGCCCAGCAACTGCTCGATGCCTACAACGCCC 

7381 + + + + + + 7440 

GCTTCCGGAAGCGGCTTCGGCCGCACCGGCGGGTCGTTGACGAGCTACGGATGTTGCGGG 
-2 KAFAEAGVAAQQLLDAYNAR- 

GGAACCGCTCCGGATCCGGGACCCCCTCCGCCGTCTGAGATCCGGTACCGGGGCACAGGG 

7441 + + + + + + 7500 

CCTTGGCGAGGCCTAGGCCCTGGGGGAGGCGGCAGACTCTAGGCCATGGCCCCGTGTCCC 
-2-* NRSGSGTPSAV* - 

GCGCCGCCGCCCGCTTTCCCGGCGGGGCACTGGCCGGGGGACATGCTCTCCCGCCCCCGG 

7501 + + + + + + 7560 

CGCGGCGGCGGGCGAAAGGGCCGCCCCGTGACCGGCCCCCTGTACGAGAGGGCGGGGGCC 

CAGGACGTAGGGTCAACCCGCCTGCGCCTTCAGGTGGCGGCGCAGATACTCACCGGTCAG 

7561 + + + + + + 7620 

GTCCTGCATCCCAGTTGGGCGGACGCGGAAGTCCACCGCCGCGTCTATGAGTGGCCAGTC 

* GAQAKLHRRLYEGTL- 

GGAGGAATCCGCGGCGAGCAGGTCCTTCGGTGTGCCGGTGAAGACGATCTCGCCGCCCTC 

76 21 + + + + + + 7680 

CCTCCTTAGGCGCCGCTCGTCCAGGAAGCCACACGGCCACTTCTGCTAGAGCGGCGGGAG 
-1 SSDAALLDKPTGTFVI EGGE- 

CCGTCCCCCGTCGGGACCCAGGTCGATGATCCAGTCGGCCTGCTGCACCACATCGAGGTT 

7681 + + + + + + 7740 

GGCAGGGGGCAGCCCTGGGTCCAGCTACTAGGTCAGCCGGACGACGTGGTGTAGCTCCAA 
-1 RGGDPGLDI IWDAQQVVDLN- 

GTGCTCGATGACCACGACGGTGTTCCCGGCCTCGACGAGCCCGTCCAGGAGCTTCAGCAG 

7741 + + + + + + 7800 

CACGAGCTACTGGTGCTGCCACAAGGGCCGGAGCTGCTCGGGCAGGTCCTCGAAGTCGTC 
-1 HE I VVVTNGAEVLGDLLKLL- 

GGTGTCAACGTCCGACATGTGCAGCCCGGTGGTGGGCTCGTCCAGGACATAGACCGTGCC 

7801 + + + + + + 7860 

CCACAGTTGCAGGCTGTACACGTCGGGCCACCACCCGAGCAGGTCCTGTATCTGGCACGG 
-1 TDVDSMHLGTTPEDLVYVTG- 

CGTGCGGTGCAGCTGGTCGGCAAGTTTGATCCGCTGCAGTTCACCGCCGGAGAGGCTGGA 

7861 + + + + + + 7920 

GCACGCCACGTCGACCAGCCGTTCAAACTAGGCGACGTCAAGTGGCGGCCTCTCCGACCT 
-1 TRHLQDALKIRQLEGGSLSS- 

AAGCGGCTGGCCCAGGCTGAGGTACCCAAGACCGACGTCGACGAGAGCGCGCAGTTTCGG 

7921 + + + + + + 7980 

TTCGCCGACCGGGTCCGACTCCATGGGTTCTGGCTGCAGCTGCTCTCGCGCGTCAAAGCC 
-1 LPQGLS LYGLGVDVLARLKP- 

CAGCAGGGCCTTCTCGGTGAAGAACTCGACGGCCTCGTCGGCGGGCAGCTCCAGGACGTC 

7981 + + + + + + 8040 

GTCGTCCCGGAAGAGCCACTTCTTGAGCTGCCGGAGCAGCCGCCCGTCGAGGTCCTGCAG 
-1 LLAKETFFEVAEDAPLELVD- 

CGCGATCGACTTCCCGCGAAGCTGGTGCTCCAGGACCTCGGGCTTGAAGCGGCGCCCCTC 

8041 + + + + + + 8100 

GCGCTAGCTGAAGGGCGCTTCGACCACGAGGTCCTGGAGCCCGAACTTCGCCGCGGGGAG 
-1 AISKGRLQHELVEPKFRRGE- 
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ACAGACACCGCAGTGCGTGGTCACCGGATCCATGAAGGCCAGCTCGGTGATGATGACCCC 



8101 + + + + + + 8160 

TGTCTGTGGCGTCACGCACCAGTGGCCTAGGTACTTCCGGTCGAGCCACTACTACTGGGG 
-1 CVGCHTTVPDMFALET I I V G - 

GCGGCCCTGGCACTCCTCGCACGACCCCTTGGAGTTGAAGCTGAACAGCGAGGCGTTCGC 

8161 + + + + + + 8220 

CGCCGGGACCGTGAGGAGCGTGCTGGGGAACCTCAACTTCGACTTGTCGCTCCGCAAGCG 
-1 RGQCEECSGKSNFS FLSANA- 

GCCGGTCTCCTTCGCGAACAGCTTGCGCAGCGGGTCCATCAGGCCGAGGTAGGAGACCGG 

8221 + + + + + + 8280 

CGGCCAGAGGAAGCGCTTGTCGAACGCGTCGCCCAGGTAGTCCGGCTCCATCCTCTGGCC 
-1 GTEKAFLKRLPDMLGLYSVP- 

TGTGGAGCGCGACGAGGCGGCGATCGCGGACTGGTCGACAAAGACCGCGTCGGGGTGCGC 

8281 + + + + + + 8340 

ACACCTCGCGCTGCTCCGCCGCTAGCGCCTGACCAGCTGTTTCTGGCGCAGCCCCACGCG 
-1 TSRS SAAIASQDVFVADPHA- 

CTCCATGAATGCCCCGGAGATCAGGCTGCTCTTGCCGGAACCCGCCACCCCGGTCACCGC 

8341 + + + + + + 8400 

GAGGTACTTACGGGGCCTCTAGTCCGACGAGAACGGCCTTGGGCGGTGGGGCCAGTGGCG 
-1 EMFAGS ILSSKGSGAVGTVA- 

GGTCAGCACACCGGTGGGCACGGCCACGGAGACCTGCTTCAGGTTGTGGAGATCCGCGTT 

8401 + + + + + + 8460 

CCAGTCGTGTGGCCACCCGTGCCGGTGCCTCTGGACGAAGTCCAACACCTCTAGGCGCAA 
-1 TLVGT PVAVSVQKLNHLDAN- 

CTCCACGGTCAGCTCCCCCGTGGGCGGGCGGACCTCCTCCTTCACGCGGGCCCCCCGCCG 

8461 + + + + + + 8520 

GAGGTGCCAGTCGAGGGGGCACCCGCCCGCCTGGAGGAGGAAGTGCGCCCGGGGGGCGGC 
-1 EVTLEGTPPRVEEKVRAGRR- 

CAGAGCCTCCCCGGTCCGGGTCTTCGCCTTCCGCAGCTTCGCGAAGGACCCCTCGAACAC 

8521 + + + + + + 8580 

GTCTCGGAGGGGCCAGGCCCAGAAGCGGAAGGCGTCGAAGCGCTTCCTGGGGAGCTTGTG 
-1 LAEGTRTKAKRL KAF S GE FV- 

GATCTCGCCCCCGTGCACTCCCGCCCCGGGACCGACATCGACGATGTGGTCGGCGATCTC 

8581 + + + + + + 8640 

CTAGAGCGGGGGCACGTGAGGGCGGGGCCCTGGCTGTAGCTGCTACACCAGCCGCTAGAG 
-1 I EGGHVGAGPGVDVI HDAI E- 

GATCACAtcGGGGTCGTGCTCGACGACCAGCACGGTGTTCCCCTTGTCGCGCAGCGCGCG 

8641 + + + + + + 8700 

CTAGTGTagCCCCAGCACGAGCTGCTGGTCGTGCCACAAGGGGAACAGCGCGTCGCGCGC 
-1 I VD PDHEVVLVTNGKDRLAR- 

CAGCAGGTCGTTGAGCCGCCCCACGTCGCGCGGGTGCAGGCCGATGCTGGGCTCGTCGAA 

8701 + + + + + + 8760 

GTCGTCCAGCAACTCGGCGGGGTGCAGCGCGCCCACGTCCGGCTACGACCCGAGCAGCTT 
-1 LLDNLRGVDRPHLG I S PEDF- 

GATGTACGTGAGCCCGGCCAGACCACTGCCGAGGTGGCGCACCATCTTCAGCCGCTGCCC 

8761 + + + + + + 8820 

CTACATGCACTCGGGCCGGTCTGGTGACGGCTCCACCGCGTGGTAGAAGTCGGCGACGGG 
-1 I YTLGALGS GLHRVMKLRQG- 

CTCGCCCCCCGAGAGGTCGGCCGTGGGCCTGTCCAGGGTCAGGTAGCCGAGCCCGATGGA 

8821 + + + + + + 8880 

GAGCGGGGGGCTCTCCAGCCGGCACCCGGACAGGTCCCAGTCCATCGGCTCGGGCTACCT 
-1 EGGSLDATPRDLTLYGLGIS- 
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CACGATCCGCTCCAGGGCCGTGCGCGCGGCTTTCGCGAGAGGGGCAGCGGCCGGCTCCGT 

8881 + + + + + + 8940 

GTGCTAGGCGAGGTCCCGGCACGCGCGCCGAAAGCGCTCTCCCCGTCGCCGGCCGAGGCA 
-1 VI RELATRAAKALPAAAPET- 

GACGCCGGCGAGCACCTCCGTGAGGTCGCGGACCTCCATGCTCGAGTAGTCGGCGATGTT 

8941 + + + + + + 9000 

CTGCGGCCGCTCGTGGAGGCACTCCAGCGCCTGGAGGTACGAGCTCATCAGCCGCTACAA 
-1 VGALVETLDRVEMS SYDAIN- 

CTTGCCGTCGATCCGGACGTCGAGCGCGGCGGCGTTGAGCCGCGCGCCCCGGCAGGAGGG 

9001 + + + + + + 9060 

GAACGGCAGCTAGGCCTGCAGCTCGCGCCGCCGCAACTCGGCGCGCGGGGCCGTCCTCCC 
-1 KGD I RVDLAAANLRAGRC S P - 

ACAGACTCCGTCgGTGACGAAACGTTCGATGACCTCGCGCTTGCGGTCGcTCAGCGCGCT 

9061 + + + + + + 9120 

TGTCTGAGGCAGcCACTGCTTTGCAAGCTACTGGAGCGCGAACGCCAGCgAGTCGCGCGA 
-1 CVGDTVFRE IVERKRDSLAS- 

GAGGTCGCGCTTGAGGTTGAgCCGCTCGAACCGGTCGGCcAACCCCTCGTAGTTCGTCTG 

9121 + + + + + + 9180 

CTCCAGCGCGAACTCCAACTcGGCGAGCTTGGCCAGCCGgTTGGGGAGCATCAAGCAGAC 
-1 LDRKLNLRE FRDALGEYNTQ- 

GAACTCGGTGCTCTTGGTCTTCAGCGTcACCTTCCCGCCGGTGcCGCGCAGCAGCGTGTC 

9181 + + + + + + 9240 

CTTGAGCCACGAGAACCAGAAGTCGCAgTGGAAGGGCGGCCACgGCGCGTCGTCGCACAG 
-1 FETS KTKLTVKGGTGRLLTD- 

CAGCTCCTCGGCGCTGTACTCGGCGATCGGCTTGGCCGGATCCAGACGGCCGGACTTCGC 

9241 + + + + + + 9300 

GTCGAGGAGCCGCGACATGAGCCGCTAGCCGAACCGGCCTAGGTCTGCCGGCCTGAAGCG 
-1 LEEASYEAI PKAPDLRGSKA- 

CCAGATCTGCCAGTCCGGGCTACCCACCTTGTACTCGGGGAAAAGGACCGCCCCGTCGTC 

9301 + + + + + + 9360 

GGTCTAGACGGTCAGGCCCGATGGGTGGAACATGAGCCCCTTTTCCTGGCGGGGCAGCAG 
-1 W I QWD PSGVKYE P FLVAGDD- 

CAGGGACTTCGAGCGGTCCAGCATCTTGTCCAGGTCGAGGGCGATGCTCTGGCCGAGACC 

9361 + + + + + + 9420 

GTCCCTGAAGCTCGCCAGGTCGTAGAACAGGTCCAGCTCCCGCTACGAGACCGGCTCTGG 
-1 LS KSRDLMKDLDLAI SQGLG- 

GTCGCAGTCCGGGCACATGCCCTGGGGGTCGTTGAACGAGAACGCGGAGACGCCGAGCGA 

9421 + + + + + + 9480 

CAGCGTCAGGCCCGTGTACGGGACCCCCAGCAACTTGCTCTTGCGCCTCTGCGGCTCGCT 
-1 DCDPCMGQPDNFS FASVGLS- 

GGACGGCCCGTCGTCCTTCGTCGTGCCGAACCGTGCGAACAGGGCCCGGATCATCGGCTG 

9481 + + + + + + 9540 

CCTGCCGGGCAGCAGGAAGCAGCACGGCTTGGCACGCTTGTCCCGGGCCTAGTAGCCGAC 
-1 SPGDDKTTGFRAFLARIMPQ- 

TACGTCCGTCATGGTCCCCACCGTGGACCGGGCGTTGCCCCCCACGGGCTTCTGGTCGAC 

9541 + + + + + + 9600 

ATGCAGGCAGTACCAGGGGTGGCACCTGGCCCGCAACGGGGGGTGCCCGAAGACCAGCTG 
-1 VDTMTGVTS RANGGVP KQDV- 

GATCACCGGGGTGGTGAGGTTCTCGATCGCCTCGGCCTGAGGACGTTCGTACTTCGGAAG 

9601 + + + + + + 9660 

CTAGTGGCCCCACCACTCCAAGAGCTAGCGGAGCCGGACTCCTGCAAGCATGAAGCCTTC 
-1 IVPTTLNE IAEAQPREYKPL- 



CTGGTTGCGGATGTACCAGCTGAAGGTGGAGTTCAGCTGTCGCTGGGCCTCCACGGCCAC 
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9661 + + + + + + 9720 

GACCAACGCCTACATGGTCGACTTCCACCTCAAGTCGACAGCGACCCGGAGGTGCCGGTG 
-1 QNR I YWS FTSNLQRQAEVAV- 

CGTGTCGAAGACGATCGACGACTTGCCCGAACCCGAGACCCCCGTGAAGACCGTGATCTG 

9721 + + + + + + 9780 

GCACAGCTTCTGCTAGCTGCTGAACGGGCTTGGGCTCTGGGGGCACTTCTGGCACTAGAC 
-1 TDFVI SSKGSGSVGTFVTIQ- 

GTTGCGGGGAATCGTCAGGGAGACATCTTTGAGGTTGTGGATCCGCGCGCCCGCGATGCG 

9781 + + + + + + 9840 

CAACGCCCCTTAGCAGTCCCTCTGTAGAAACTCCAACACCTAGGCGCGCGGGCGCTACGC 
-1 NRPITLSVDKLNHI RAGAIR- 

GATGCCGTCTCCCGGGCCGGATGTTTTTCCCGCGCCGGCGGTGGGGTCGGTGACGCTCAC 

984! + + + + + + 9900 

CTACGGCAGAGGGCCCGGCCTACAAAAAGGGCGCGGCCGCCACCCCAGCCACTGCGAGTG 
-l-< IGDGPGSTKGAGATPDTVSN- 

AGAGTTTTCCTCCTGGCTTCCGTACATGATTTACCGTGTCAGCCGGGCAAACCGGCGGAA 

9901 + + + + + + 9960 

TCTCAAAAGGAGGACCGAAGGCATGTACTAAATGGCACAGTCGGCCCGTTTGGCCGCCTT 

CGGTAACCACCTAGCTTGTACTCAGGAGGTGTCCGGGGTCTTCTCCTCCCGTGCTGACTT 

9961 + + + + + + 10020 

GCCATTGGTGGATCGAACATGAGTCCTCCACAGGCCCCAGAAGAGGAGGGCACGACTGAA 
0-* * STDPTKEERASK- 

GGGGGCCGGCCCGCCGGACAGGGCCGGCTCCGTGTTCCACCCCGCCAGCCGATCCCCCCG 

10021 + + + + + + 10080 

CCCCCGGCCGGGCGGCCTGTCCCGGCCGAGGCACAAGGTGGGGCGGTCGGCTAGGGGGGC 
0 PAPGGSLAPETNWGALRDGR- 

CTCCGTCTCGTCCTCCTCGAGAACGATCCGGCTGCTCGCCCAGCGCAGGATCGGCGGCGC 

10081 + + + + + + 10140 

GAGGCAGAGCAGGAGGAGCTCTTGCTAGGCCGACGAGCGGGTCGCGTCCTAGCCGCCGCG 
0 ETEDEELVIRSSAWRLIPPA- 

CGTCACCGAGGTGATGAGGGCGACCAGCACGATGATCGTGAAGGTCACGGTGTCCAGTAC 

10141 + + + + + + 10200 

GCAGTGGCTCCACTACTCCCGCTGGTCGTGCTACTAGCACTTCCAGTGCCACAGGTCATG 
0 TVST I LAVLVI ITFTVTDLV- 

GCCGATACGCAGGCCGACCAGGGCGATCACCACCTCGATCATTCCACGCGAGTTCATCCC 

10201 + + + + + + 10260 

CGGCTATGCGTCCGGCTGGTCCCGCTAGTGGTGGAGCTAGTAAGGTGCGCTCAAGTAGGG 
0 G I RLGVLAIVVE IMGRSNMG- 

CGCTCCGAGCGCCAGCCCCTCGTAGCGGCTCATCCCGCCACTACGGGCGGCGACGTACGC 

10261 + + + + + + 10320 

GCGAGGCTCGCGGTCGGGGAGCATCGCCGAGTAGGGCGGTGATGCCCGCCGCTGCATGCG 
0 AGLALGEYRSMGGSRAAVYA- 

ACCGGCGAACTTGCCGAAAGTGGCCACCAACAGCACCCCGAGGCCCGTGAGCAGCACCGA 

10321 + + + + + + 10380 

TGGCCGCTTGAACGGCTTTCACCGGTGGTTGTCGTGGGGCTCCGGGCACTCGTCGTGGCT 
0 GAFKGFTAVLLVGLGTLLVS- 

CGGCTCCGCGAGTGCGGTCAGGTCCATGCGAAGCCCCACACTGCCCAGGAACACCGGTGC 

10381 + + + + + + 10440 

GCCGAGGCGCTCACGCCAGTCCAGGTACGCTTCGGGGTGTGACGGGTCCTTGTGGCCACG 
0 PEALATLDMRLGVSGLFVPA- 

GAACACGGCCATGACCAGCGTGCGCAGCGGGGCGAGCCGTACCGGGGCGATGTGCCTCAG 

10441 + + + + + + 10500 

CTTGTGCCGGTACTGGTCGCACGCGTCGCCCCGCTCGGCATGGCCCCGCTACACGGAGTC 
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0 FVAMVLTRL PALRVPAI HRL- 

CAGGGTCGCACCGGCCACGAACGCCCCGAACAACGCCTCCATCCCGGCCGCCGCGGTCT^G 

10501 + + + + + + 10560 

GTCCCAGCGTGGCCGGTGCTTGCGGGGCTTGTTGCGGAGGTAGGGCCGGCGGCGCCAGTC 
0 LTAGAVFAGFLAEMGAAATL- 

CGCCCCGTACAGGACGACCACGGCCACGCCGACGGTGACGGCCGATACGGGGACCCGGCT 

10561 + + + + + + 10620 

GCGGGGCATGTCCTGCTGGTGCCGGTGCGGCTGCCACTGCCGGCTATGCCCCTGGGCCGA 
0 AGYLVVVAVGVTVASVPVRS- 

GTCACCCGTACGGGACAGCCGCCTGCCGATCGGGCCGCCCACCGCACACGCCGCGGCGAC 

10621 + + + + + + 10680 

CAGTGGGCATGCCCTGTCGGCGGACGGCTAGCCCGGCGGGTGGCGTGTGCGGCGCCGCTG 
0 DGTRSLRRGI PGGVACAAAV- 

GAAGACGGTCGTCCAGGCCATCGTGGTCAGGACCACGGGCCCCCCGGCCGCCCCACTCGC 

10681 + + + + + + 10740 

CTTCTGCCAGCAGGTCCGGTAGCACCAGTCCTGGTGCCCGGGGGGCCGGCGGGGTGAGCG 
0 FVTTWAMTTLVVPGGAAGSA- 

CAGCGCCGTCACCAGAGCGAGCAGCAGCCAGCCCACCGCGTCGTCGAACACCGCTGCCGC 

10741 + + + + + + 10800 

GTCGCGGCAGTGGTCTCGCTCGTCGTCGGTCGGGTGGCGCAGCAGCTTGTGGCGACGGCG 
0 LATVLALLLWGVADD FVAAA- 

GATGAGCAGCTGGCCGACGTTGCGGTGCGTCAGATTCAGGTCGGCGAGCGTCTTGGCGAT 

10801 + + + + + + 10860 

CTACTCGTCGACCGGCTGCAACGCCACGCAGTCTAAGTCCAGCCGCTCGCAGAACCGCTA 
0 ILLQGVNRHTLNLDALTKAI- 

CACCGGGAGGGCCGTGACACACATCGCGACCCCGAGGAACAGCGCGAAGACGCCCCGCTC 

10861 + + + + + + 10920 

GTGGCCCTCCCGGCACTGTGTGTAGCGCTGGGGCTCCTTGTCGCGCTTCTGCGGGGCGAG 
0 VPLATVCMAVGL FLAFVGRE- 

TCCGGAGTCCGCGAGCAGCGAGGCGGGCACCAGGTAGCCGGTGGCGATGCCCAGCCCCAG 

10921 + + + + + + 10980 

AGGCCTCAGGCGCTCGTCGCTCCGCCCGTGGTCCATCGGCCACCGCTACGGGTCGGGGTC 
0 GSDALLSAPVLYGTAIGLGL- 

AGGAATCAGAAGACCCGCCAGGCTGACCCGGGCGGCCAGACCCCCGCGCTTGCGCAGGAT 

10981 + + + + + + 11040 

TCCTTAGTCTTCTGGGCGGTCCGACTGGGCCCGCCGGTCTGGGGGCGCGAACGCGTCCTA 
0 P I LLGALSVRAALGGRKRL I- 

CCGGGGGTCGAACTGGGCACCTGCGATGGCCACCAGCAGAAGGACGCCGAACTGGCAGAA 

11041 + + + + + + 11100 

GGCCCCCAGCTTGACCCGTGGACGCTACCGGTGGTCGTCTTCCTGCGGCTTGACCGTCTT 
0 RPDFQAGAIAVLLLVGFQCF- 

CGCGTCGAGCAGGTGCGCCTGCGAGATGTCCTCGGGAAACAGCCTGCCGGAAAGTCCCGG 

11101 + + + + + + 111^0 

GCGCAGCTCGTCCACGCGGACGCTCTACAGGAGCCCTTTGTCGGACGGCCTTTCAGGGCC 
0 ADLLHAQS IDEPFLRGSLGP- 

CGAGATCTGCCCCAGCAGGGTCGGCCCGAGCAGTACCCCCGCGGTCAGCTCCCCCACCAG 

11161 + + + + + + H220 

GCTCTAGACGGGGTCGTCCCAGCCGGGCTCGTCATGGGGGCGCCAGTCGAGGGGGTGGTC 
0 S I QGLLTPGLLVGATLEGVL- 

CGGCGGCAGACCGATCCGGGTCCCCAGCCGTCCCAGACCGTAGGCACAGGCGAGCAGGAG 

11221 + + + + + + 11280 

GCCGCCGTCTGGCTAGGCCCAGGGGTCGGCAGGGTCTGGCATCCGTGTCCGCTCGTCCTC 
0 PPLGIRTGLRGLGYACALLL- 
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GCCGACCTGGAGCAGGAAGACCGTCAGCGGCTCCCCGCCCAGCGGCGACGTGGCTGCGAG 

11281 + + + + + + H340 

CGGCTGGACCTCGTCCTTCTGGCAGTCGCCGAGGGGCGGGTCGCCGCTGCACCGACGCTC 
0 GVQLLFVTLPEGGLPSTAAL- 

CACAGCCACGTCAGGACCGCGCACCGGGAACCCAGCCCAGCCCGTCCGTCGACGCGGCCA 

11341 + + + + + + 11400 

GTGTCGGTGCAGTCCTGGCGCGTGGCCCTTGGGTCGGGTCGGGCAGGCAGCTGCGCCGGT 
0-< V A V - 

11-* * SRAGPVWGLGDTSAA 

GACCCCCCTGCCTCACCGGTCGCTCGGCCCCCGCCTCATCCCCCAGAAGAGCCCGTGCCT 

11401 + + + + + + H460 

CTGGGGGGACGGAGTGGCCAGCGAGCCGGGGGCGGAGTAGGGGGTCTTCTCGGGCACGGA 
11 LGGQRVPREAGAEDGLLARA 

GCAGTGCGGCGCTCTGCTCCATGAGGCGGCCCACCACCTTTCCCGGCACGGCGCCGTGCG 

11461 + + + + + + 11520 

CGTCACGCCGCGAGACGAGGTACTCCGCCGGGTGGTGGAAAGGGCCGTGCCGCGGCACGC 
11 QLAASQEMLRGVVKGPVAGH 

GCCCGTCGGCGTCGCCCGCAGCGGTGTGCGTCATGCCGGCCATCTCGTCGGACGCCTCGG 

11521 + + + + + + H580 

CGGGCAGCCGCAGCGGGCGTCGCCACACGCAGTACGGCCGGTAGAGCAGCCTGCGGAGCC 
11 PGDADGAATHTMGAMED SAE 

AGAACCGCTGCCTGGCCCGGGCCGTGTCGGCGAACTCGTCGGAGGAGACCCCGCCGATCA 

11581 + + + + + + 11640 

TCTTGGCGACGGACCGGGCCCGGCACAGCCGCTTGAGCAGCCTCCTCTGGGGCGGCTAGT 
11 SFRQRARATDAFEDSSVGGI 

GTTCGACGAAGGACTGCAGGTCGGAGTCCGCGGTGTTGGAGATCTTCCGGGCCTGCCAGA 

11641 + + + + + + II 700 

CAAGCTGCTTCCTGACGTCCAGCCTCAGGCGCCACAACCTCTAGAAGGCCCGGACGGTCT 

11 LEVFSQLDSDATNS IKRAQW 

AATAGGAGTCCTCCGAATGGTGCATGTCGTAGAAGCCGACCAGGAACTCGTAGAAGCGGC 

11701 + + + + + + 11760 

TTATCCTCAGGAGGCTTACCACGTACAGCATCTTCGGCTGGTCCTTGAGCATCTTCGCCG 
11 FYSDESHHMDYFGVLFEYFR 

CGTACTCCAGCCGGTAGCGGGCCTCGAACTCCTCGAACGCGCTGGTCTCGTCGACCGACC 

11761 + + + + + + 11820 

GCATGAGGTCGGCCATCGCCCGGAGCTTGAGGAGCTTGCGCGACCAGAGCAGCTGGCTGG 
11 GYELRYRAEFEEFASTEDVS 

CGTCCAGGCAGGAGTTGAGCGAGCGCGCTGCCAGCAGTCCGCTGTAGGTGGCGAGGTGCA 

11821 + + + + + + H880 

GCAGGTCCGTCCTCAACTCGCTCGCGCGACGGTCGTCAGGCGACATCCACCGCTCCACGT 
11 GDLCSNLSRAALLGSYTALH 

CCCCGGAGGAGAACACCGGGTCGACGAAGCACGCGGCATCCCCGACCAGGGCCATGCCCG 

11881 + + + + + + H940 

GGGGCCTCCTCTTGTGGCCCAGCTGCTTCGTGCGCCGTAGGGGCTGGTCCCGGTACGGGC 
11 VGSSFVPDVFCAADGVLAMG 

GCGCCCAGAACTTCGTGTTGCTGTACGACCAGTCCTTGCGGACCCGGAGCTCGCCGTAGG 

H941 + + + + + + 12000 

CGCGGGTCTTGAAGCACAACGACATGCTGGTCAGGAACGCCTGGGCCTCGAGCGGCATCC 
11 PAWFKTNSYSWDKRVRLEGY 

GGCCCTCGGTCACCCGGGTGGCCTCGGAGAGCTTCTCCGCGATCAGCGGGCAGGCCGCGA 

12001 + + + + + + 12060 

CCGGGAGCCAGTGGGCCCACCGGAGCCTCTCGAAGAGGCGCTAGTCGCCCGTCCGGCGCT 
11 PGETVRTAESLKEAI LPCAA 
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TGAACGACTCCATCGCCTTCTCGGGGTCGCCCTGCACCAGGCTCGCCGAGTCCCGGTTCA 

12061 + + + + + + 12120 

11 I FSEMAKEPDGQVLSASDRN 

CCACTGCGCCGACACTCGTCAGCTCGGGAGACAGGGGTATGTACCAGAACCACCCGTGCT 

12121 + + + + + + 12180 

GGTGACGCGGCTGTGAGCAGTCGAGCCCTCTGTCCCCATACATGGTCTTGGTGGGCACGA 
11 VVAGVSTLEPSLPIYWFWGH 

CGAAGGTGCAGGTGAAGATGTTCCCGGAGTTCGGCTTCGGAAGCCGCTTGCCGCCGTTGA 

12181 + + + + + + 12240 

GCTTCCACGTCCACTTCTACAAGGGCCTCAAGCCGAAGCCTTCGGCGAACGGCGGCAACT 
11 EFTCTFINGSNPKPLRKGGN 

AGTAGCCGAACAGGGCCAGGTTGCGGAAGAAGGGCGAGTACTCGCGCTTGGCGCCCGACT 

12241 + + + + + + 12300 

TCATCGGCTTGTCCCGGTCCAACGCCTTCTTCCCGCTCATGAGCGCGAACCGCGGGCTGA 
11 FYGFLALNRFFPSYERKAGS 

TCTTGTACAGCCCACCGGTGTTGCCGGAGGCGTCCACGACGAAACGGGAGCCCACCTCGT 

12301 + + + + + + 12360 

AGAACATGTCGGGTGGCCACAACGGCCTCCGCAGGTGCTGCTTTGCCCTCGGGTGGAGCA 
11 KKYLGGTNGSADVVFRS GVE 

GCTCGCGCCCCTCGGAGTCCCGGTAGCGCACGCCCCGCACCCGGCCGTCCTCGGCCTTGA 

12 361 + + + + + + 12420 

CGAGCGCGGGGAGCCTCAGGGCCATCGCGTGCGGGGCGTGGGCCGGCAGGAGCCGGAACT 

11 HERGESDRYRVGRVRGDEAK 

GCACGTCGAGGACATCGCTGTTCTCCCGCACCTCGACACCGTGCCTGCGAGCGTTGTCGA 

12421 + + + + + + 12480 

CGTGCAGCTCCTGTAGCGACAAGAGGGCGTGGAGCTGTGGCACGGACGCTCGCAACAGCT 
11 LVDLVDSNERVEVGHRRAND 

GCAGGATCTGGTCGAACTTCATGCGCTCGACCTGGTACGCGTACCCCGTCGCCCCCGGCA 

12481 + + + + + + 12540 

CGTCCTAGACCAGCTTGAAGTACGCGAGCTGGACCATGCGCATGGGGCAGCGGGGGCCGT 

11 LLIQDFKMREVQYAYGTAGP 

TCCGGCGCGAGACGGCGAAGTCGAACGTCCACGGTTCGGGGTTGGCACCCCACTTGAACG 

12541 + + + + + + !2600 

AGGCCGCGCTCTGCCGCTTCAGCTTGCAGGTGCCAAGCCCCAACCGTGGGGTGAACTTGC 
11 MRRSVAFDFTWPEPNAGWKF 

TCCCGCCGTGCTTGATCGTGAAGGCTGCCTTCTTCAGCTCGTCGGAGACACCGAGGAGGT 

12601 + + + + + + 12660 

AGGGCGGCACGAACTAGCACTTCCGACGGAAGAAGTCGAGCAGCCTCTGTGGCTCCTCCA 
11 TGGHKITFAAKKLEDSVGLL 

GTGCGATGCCGTGGACGGTGGAGGGGAGGAGCGACTCACCGATCTGGTAGCGCGGGAAGG 

12 661 + + + + + + 12720 

CACGCTACGGCACCTGCCACCTCCCCTCCTCGCTGAGTGGCTAGACCATCGCGCCCTTCC 

11 HAIGHVTSPLLSEGIQYRPF 

TCTCCTTCTCCAGCTGGAGTACGCGATGGCCCCGCTTGCGGACCAGCGTGGAGACGGTCG 

12721 + + + + + + 12780 

AGAGGAAGAGGTCGACCTCATGCGCTACCGGGGCGAACGCCTGGTCGCACCTCTGCCAGC 
11 TEKELQLVRHGRKRVLTSVT 

AGCCCGCCGGACCTCCGCCGACCACGATGACGTCGTACTGCGCTGACACGTCCACGGACT 

12781 + + + + + + 12840 

TCGGGCGGCCTGGAGGCGGCTGGTGCTACTGCAGCATGACGCGACTGTGCAGGTGCCTGA 

ll-< SGAPGGGVVIVDYQASVDM 

CTCCTTCTCGCACATCGGGCGTCTCATATTCCCAGGAATCCTCTGGCCCGCCCAGGTGCT 
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12841 + + + + + + 12900 

GAGGAAGAGCGTGTAGCCCGCAGAGTATAAGGGTCCTTAGGAGACCGGGCGGGTCCACGA 

GCCGCATCTTCGGTATTGCGAAGTCGTGGGCATTCTGCGAGAAGCATGAACCGCGTGGCC 

12901 + + + + + + 12960 

CGGCGTAGAAGCCATAACGCTTCAGCACCCGTAAGACGCTCTTCGTACTTGGCGCACCGG 

CGGTCTACAGTGGCGTGGAATTTCAGTGATTGCGCTGAAGGGCGGCACACGATGAAGGCA 

12961 + + + + + + 13020 

GCCAGATGTCACCGCACCTTAAAGTCACTAACGCGACTTCCCGCCGTGTGCTACTTCCGT 
10-> M K A 

CTTGTACTGTCGGGTGGTTCGGGGACCCGCCTGCGCCCGATCAGTTACGCCATGCCGAAG 

13021 + + + + + + 13080 

GAACATGACAGCCCACCAAGCCCCTGGGCGGACGCGGGCTAGTCAATGCGGTACGGCTTC 
10 LVLSGGSGTRLRPI SYAMPK 

CAGCTCGTTCCGATCGCCGGGAAGCCAGTCCTTGAATATGTTCTGGATAATATCCGGAAC 

13081 + + + + + + 13140 

GTCGAGCAAGGCTAGCGGCCCTTCGGTCAGGAACTTATACAAGACCTATTATAGGCCTTG 
10 QLVPIAGKPVLEYVLDNIRN 

CTCGATATCAAAGAGGTCGCCATTGTCGTCGGTGACTGGGCTCAGGAAATTATTGAGGCA 

13141 + + + + + + 13200 

GAGCTATAGTTTCTCCAGCGGTAACAGCAGCCACTGACCCGAGTCCTTTAATAACTCCGT 
10 LDIKEVAIVVGDWAQEI IEA 

ATGGGTGACGGCAGCCGTTTCGGTCTGCGCCTCACCTACATACGCCAGGAGCAACCTCTG 

13201 + + + + + + 13260 

TACCCACTGCCGTCGGCAAAGC CAGACGCGGAGTGGATGTATGCGGT CCTCGTTGGAGAC 
10 MGDGSRFGLRLTYIRQEQPL 

GGCATCGCGCACTGCGTGAAACTGGCCCGAGACTTCCTCGACGAGGACGACTTCGTCCTC 

13261 + + + + + + 13320 

CCGTAGCGCGTGACGCACTTTGACCGGGGTCTGAAGGAGCTGCTCCTGCTGAAGCAGGAG 
10 GIAHCVKLARDFLDEDDFVL 

TACCTAGGCGACATCATGCTGGACGGAGACCTGTCCGCGCAGGCGGGGCACTTCCTCCAC 

13321 + + + + + + 13380 

ATGGATCCGCTGTAGTACGACCTGCCTCTGGACAGGCGCGTCCGCCCCGTGAAGGAGGTG 
10 YLGDIMLDGDLSAQAGHFLH 

ACCCGCCCCGCCGCGCGGATCGTCGTGCGCCAGGTGCCCGACCCCCGGGCCTTCGGGGTG 

13381 + + + + - "-- + + 13440 

TGGGCGGGGCGGCGCGCCTAGCAGCACGCGGTCCACGGGCTGGGGGCCCGGAAGCCCCAC 
10 TRPAARIVVRQVPDPRAFGV 

ATCGAGCTGGACGGCGAAGGGCGTGTGCTGCGCCTGGTCGAGAAACCCCGTGAACCGCGC 

13441 + + + + + + 13500 

TAGCTCGACCTGCCGCTTCCCGCACACGACGCGGACCAGCTCTTTGGGGCACTTGGCGCG 
10 IELDGEGRVLRLVEKPREPR 

AGCGACCTCGCGGCGGTCGGCGTGTACTTCTTCACCGCGGACGTGCACCGCGCCGTCGAC 

13501 + + + + + + 13560 

TCGCTGGAGCGCCGCCAGCCGCACATGAAGAAGTGGCGCCTGCACGTGGCGCGGCAGCTG 
10 SDLAAVGVYFFTADVHRAVD 

GCGATTAGCCCGAGCCGACGGGGCGAGCTGGAAATCACCGACGCCATCCAGTGGCTGCTG 

13561 + + + + + + 13620 

CGCTAATCGGGCTCGGCTGCCCCGCTCGACCTTTAGTGGCTGCGGTAGGTCACCGACGAC 
10 AISPSRRGELEITDAIQWLL 

GAGCAGGGCCTGCCGGTCGAGGCCGGCCGCTACACGGACTACTGGAAGGACACCGGCCGG 

13621 + + + + + + 13680 

CTCGTCCCGGACGGCCAGCTCCGGCCGGCGATGTGCCTGATGACCTTCCTGTGGCCGGCC 
10 EQGLPVEAGRYTDYWKDTGR 
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GTCGAGGACGTCGTGGAGTGCAACCGGCGGATGCTCGGCCGTCTGGCGCTCCAGGTGTCG 

13681 + + + + + + 13740 

CAGCTCCTGCAGCACCTCACGTTGGCCGCCTACGAGCCGGCAGACCGCGAGGTCCACAGC 
VEDVVECNRRMLGRLALQVS 

GGCGAGGTGGACCCGGAGAGCGAACTGGTGGGTGCGGTGGTCGTCGAGGAGGGCGCCCGG 

13741 + + + + + + 13800 

CCGCTCCACCTGGGCCTCTCGCTTGACCACCCACGCCACCAGCAGCTCCTCCCGCGGGCC 
GEVDPES ELVGAVVVEEGAR 

GTGACGCGTTCGCGGGTCGTGGGACCAGCGGTGATCGGCGCGGGCACGGTCGTCGAGGAC 

13801 + + + + + + 13860 

CACTGCGCAAGCGCCCAGCACCCTGGTCGCCACTAGCCGCGCCCGTGCCAGCAGCTCCTG 
VTRSRVVGPAVI GAGTVVED 

AGCCAGATCGGACCGTACGCCTCCATCGGCCGGCGCTGCACCGTGCGGGCGTCCCGGCTC 

13861 + + + + + + 13920 

TCGGTCTAGCCTGGCATGCGGAGGTAGCCGGCCGCGACGTGGCACGCCCGCAGGGCCGAG 
SQIGPYAS IGRRCTVRASRL 

TCCGACTCCATCGTCCTTGACGACGCCTCGATCCTCGCGGTGAGCGGACTGCACGGCTCG 

13921 + + + + + + 13980 

AGGCTGAGGTAGCAGGAACTGCTGCGGAGCTAGGAGCGCCACTCGCCTGACGTGCCGAGC 
SDS IVLDDAS ILAVSGLHGS 

CTGATCGGAAGGGGCGCGCGGATCGCGCCCGGGGCCCGGGGCGAGGCCCGGCACCGGCTG 

13981 + + + + + + 14040 

GACTAGCCTTCCCCGCGCGCCTAGCGCGGGCCCCGGGCCCCGCTCCGGGCCGTGGCCGAC 
L IGRGARIAPGARGEARHRL 

GTCGTCGGCGACCACGTGCAGATCGAGATCGCGGCCTGACGCACCCACCGGAGCACCGGG 

14041 + + + + + + 14100 

CAGCAGCCGCTGGTGCACGTCTAGCTCTAGCGCCGGACTGCGTGGGTGGCCTCGTGGCCC 
-* VVGDHVQ I E IAA*- 

GGGAGGCTCGGCAGGGGCGTCAGGCCGTAAGAAGGGCTGCCGGGGCGGGACGGACCCGCC 

14101 + + + + + + 14160 

CCCTCCGAGCCGTCCCCGCAGTCCGGCATTCTTCCCGACGGCCCCGCCCTGCCTGGGCGG 

CCGGCAGCCCACAGGTCCCCGGTCCGCGGATATGGGGGACTCGAGGTTCGATCAGCCGAA 

14161 + + + + + + 14220 

GGCCGTCGGGTGTCCAGGGGCCAGGCGCCTATACCCCCTGAGCTCCAAGCTAGTCGGCTT 
* * G F - 

GGTCAGAGC CACGTGGCCGAGGTCGAGCC CGGAGTTGCCGGCGCCGAGGTTACAGGCGGC 

14221 + + + + + + 14280 

CCAGTCTCGGTGCACCGGCTCCAGCTCGGGCCTCAACGGCCGCGGCTCCAATGTCCGCCG 
TLAVHGLDLGSNGAGLNCAA- 

CGTGGCGCAGTCGACGCTGCCGACCGGCGTGCCTTCGGGCGTGGAGCCCGTGTACGACTT 

14281 + + + + + + 14340 

GCACCGCGTCAGCTGCGACGGCTGGCCGCACGGAAGCCCGCACCTCGGGCACATGCTGAA 
TACDVSGVPTGEPTSGTYSK- 

GCGCACGACGAAGCTGAACGACGCCGCTCCGGACGCGTCCGTGGTGAAGGACGTCGCGGT 

14341 + + + + + + 14400 

CGCGTGCTGCTTCGACTTGCTGCGGCGAGGCCTGCGCAGGCACCACTTCCTGCAGCGCCA 
RVVFSFSAAGSADTTFSTAT- 

CGCCGGGTTGCACGCGTCCTGGCCACCGACCGGAGCGCACTGGGCGATGTAGTAGGTCTC 

14401 + + + + + + 14460 

GCGGCCCAACGTGCGCAGGACCGGTGGCTGGCCTCGCGTGACCCGCTACATCATCCAGAG 
APNCADQGGVPACQAI YYTE- 



GCCGGCGGCGGCACCGCTGACCGACACCGACACGCTCTGTCCGTCACTCAGACCCGAGGC 
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14461 + + + + + + 14520 

CGGCCGCCGCCGTGGCGACTGGCTGTGGCTGTGCGAGACAGGCAGTGAGTCTGGGCTCCG 
9 GAAAGSVSVSVSQGDSLGSA- 

GGGACTGACGGAGAAGGCGGGCGCGGCGAAGGCGACGGACTGTGCGGCGGCGGCCAGGCC 

14521 + + + + + + 14580 

CCCTGACTGCCTCTTCCGCCCGCGCCGCTTCCGCTGCCTGACACGCCGCCGCCGGTCCGG 
9 P S V S FAPAAFAVS QAAAAL G - 

GATGGATGCGACGGCCACGACGCCGAACCTGGAAGCACGGCGGGACATGTGACGTAACGA 

14581 + + + + + + 14640 

CTACCTACGCTGCCGGTGCTGCGGCTTGGACCTTCGTGCCGCCCTGTACACTGCATTGCT 
9 I SAVAVVGFRSARRSMHRLS- 

CATGCGTAGGCTCCGATTCGAGGAGGGGGTTGATCACTCCATGAAAGGATCACCTCGCCG 

14641 + + + + + + I 4700 

GTACGCATCCGAGGCTAAGCTCCTCCCCCAACTAGTGAGGTACTTTCCTAGTGGAGCGGC 

9-< M - 

8-* * R A 

GACGGCCGCCTGCATCTCCCTCTGTGCTCTCGTGGATTTCCGGCACGGCACTCCCGTCGA 

14701 + + + + + + 14760 

CTGCCGGCGGACGTAGAGGGAGACACGAGAGCACCTAAAGGCCGTGCCGTGAGGGCAGCT 
8 PRGGADGETSEHIEPVASGD 

CGGCCGCCCGCAGAATGCGGCAGACCCCCCGCACCTCCTCCGGCCCCACCGCCGTACCGG 

14761 + + + + + + 14820 

GCCGGCGGGCGTCTTACGCCGTCTGGGGGGCGTGGAGGAGGCCGGGGTGGCGGCATGGCC 
8 VAARLIRCVGRVEEPGVATG 

TGGGCAGCGACAGCACCCGCTCGGTGAGCGCCTCCACCTTCGGGAGCGGATCGGGCGCGT 

14821 + + + + + + 14880 

ACCCGTCGCTGTCGTGGGCGAGCCACTCGCGGAGGTGGAAGCCCTCGCCTAGCCCGCGCA 
8 TPLSLVRETLAEVKPLPDPA 

GGCGCGCGAGGTCGGACCGGTAGGGCTCGCAGCTGTGGCAGCCGGGGCTGAAGTAGGCGC 

14881 + + + + + + 14940 

CCGCGCGCTCCAGCCTGGCCATCCCGAGCGTCGACACCGTCGGCCCCGACTTCATCCGCG 
8 HRALDSRYPECSHCGPS FYA 

GGGCCAGGACGTTGTGCCGTTGGAGCACCGCCTGGAGTTCGTCGCGGTGCAGCCCGGCGC 

X4941 + + + + + + 15000 

CCCGGTCCTGCAACACGGCAACCTCGTGGCGGACCTCAAGCAGCGCCACGTCGGGCCGCG 
8 RALVNHRQLVAQLEDRHLGA 

GGACGGCGTCCACCTCGATGACGACGTACTGGCAGTTCGACAGCTCGTTCGGATCCTGCG 

15001 + + + + + + 15060 

CCTGCCGCAGGTGGAGCTACTGCTGCATGACCGTCAAGCTGTCGAGCAAGCCTAGGACGC 
8 RVADVE IVVYQCNSLENPDQ 

GGCGGACCCGGACGCCGGGCAGTCCGTCGAGGTACTGCTCGTACAGACGGTAGTTGCGCC 

15061 + + + + + + 15120 

CCGCCTGGGCCTGCGGCCCGTCAGGCAGCTCCATGACGAGCATGTCTGCCATCAACGCGG 
8 PRVRVGPLGDLYQEYLRYNR 

GGTTGATCGCGGTGAAGTGATCGGCGGACTCCAGGGAGGTGAGGCCCATGGCCGCGCTGA 

15121 + + + + + + 15180 

CCAACTAGCGCCACTTCACTAGCCGCCTGAGGTCCCTCCACTCCGGGTACCGGCGCGACT 
8 RNIATFHDASELSTLGMAAS 

TCTCGTGCATCCGCGCGACCGTTCCGCTCCCGGTGATCTCATGCGCGGCGTTGAGCCCCT 

15181 + + + + + + 15240 

AGAGCACGTAGGCGCGCTGGCAAGGCGAGGGCCACTAGAGTACGCGCCGCAACTCGGGGA 
8 IEHMRAVTGSGTIEHAANLG 

GGTGGCGCATGGCCCGGAGCCGGTCGGCCAGGGCGTCGTCGTCGGTGACGATCGCCCCGC 
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15241 + + + + + ^ -j j u w 

CCACCGCGTACCGGGCCTCGGCCAGCCGGTCCCGCAGCAGCAGCCACTGCTAGCGGGGCG 
QHRMARLRDALADDDTVIAG 

CCTCGAAGCTGTTCACGAACTTCGTCGCCTGGAAGCTGAAGATCTCCGCCGTGCCGAAGC 

1530 l + + + + + + 15360 

GGAGCTTCGACAAGTGCTTGAAGCAGCGGACCTTCGACTTCTAGAGGCGGCACGGCTTCG 
GEFSNVFKTAQFSFIEATGF 

CGCCGATCGGCTTCGACCGGTAGGTGCAGCCGAAGGCGTGGGCGGCATCGAAGAGCAGGT 

15361 + + + + + + 15420 

GCGGCTAGCCGAAGCTGGCCATCCACGTCGGCTTCCGCACCCGCCGTAGCTTCTCGTCCA 
GGI PKSRYTCGFAHAADFLL 

GCAGCCCGTGCTCGGCGGCCAGCTTGGTCAGCTCGTCGATCCGGGCCGGTCTGCCGAAGA 

15421 + + + + + + 15480 

CGTCGGGCACGAGCCGCCGGTCGAACCAGTCGAGCAGCTAGGCCCGGCCAGACGGCTTCT 
HLGHEAALKTLEDIRAPRGF 

CGTGCACGTCCAGGATGGCGCGGGTACGCGGGCCGATGAGCCGCTCCACGTGTGCCACGT 

15481 + + + + + + 15540 

GCACGTGCAGGTCCTACCGCGCCCATGCGCCCGGCTACTCGGCGAGGTGCACACGGTGCA 
VHVD L IARTR PG I LREVHAV 

CCGCGGTTCCGGTCTCCTCGTCCAGTTCGCAGAAGACAGGCACCGCACCGATCCAGTCCA 

15541 + + + + + + 15600 

GGCGCCAAGGCCAGAGGAGCAGGTCAAGCGTCTTCTGTCCGTGGCGTGGCTAGGTCAGGT 

DATGTEEDLECFVPVAGIWD 

GTGCGTGGGCGGTGGCGACCCAGGTGAAGGAGGGCACGATCACCTCGTCCCCAGGACCGA 

15601 + + + + + + 15660 

CACGCACCCGCCACCGCTGGGTCCACTTCCTCCCGTGCTAGTGGAGCAGGGGTCCTGGCT 

LAHATAVWTFS PVIVEDGPG 

TGCCCAGGGCCTTCGCGGCGACCTGGATGCCGGTGGTGGCGTTCGATACGGCGACGCAGT 

15661 + + + + + + I 5720 

ACGGGTCCCGGAAGCGCCGCTGGACCTACGGCCACCACCGCAAGCTATGCCGCTGCGTCA 
IGLAKAAVQIGTTANSVAVC 

GCCTGACCTGGGTCAGCTCGGCCACACGGGCCTCGAACTCCCGGACCAGGGGGCCGTCAT 

15721 + + + + + + 15 780 

CGGACTGGACCCAGTCGAGCCGGTGTGCCCGGAGCTTGAGGGCCTGGTCCCCCGGCAGTA 
HRVQTLEAVRAEFERVLPGD 

TGGTGAACCACAGGCGCTCCAGCGCCCCGTCGATCCGTTCCATCAAACGGTCGCGGGAGC 

15781 + + + + + + 15840 

ACCACTTGGTGTCCGCGAGGTCGCGGGGCAGCTAGGCAAGGTAGTTTGCCAGCGCCCTCG 
NTFWLRELAGDIREMLRDRS 

CCACGTTCGGGCGTCCCACGTGCAGCGGTTCGCTGAAGTAGGGCGTGGGTAGGGAGTCCA 

15841 + + + + + + 15900 

GGTGCAAGCCCGCAGGGTGCACGTCGCCAAGCGACTTCATCCCGCACCCATCCCTCAGGT 
GVNPRGVHLPESFYPTPLSD 

GACGCACCGGGCCGCCGCTCATGCCGTGCGCACGCCGACGAAGAGGCCGGGGCTGTTGGG 

15901 + + + + + + 15 960 

CTGCGTGGCCCGGCGGCGAGTACGGCACGCGTGCGGCTGCTTCTCCGGCCCCGACAACCC 
DAPGRRSCRAHADEEAGAVG 
THRAAAHAVRTPTKRPGLLG - 
RTGPPLMPCARRRRGRGCWA- 

15901 + + + + + + 15960 

* *ATRVGVFLGPSNP- 

< LRVPGGSM 

CCGGCCGTCGGCCAGCCGGAAGCCGGGCACGAACCGCACCGAGAGCCCCACCGATTCGAA 
15961 + + + + + + 16020 
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GGCCGGCAGCCGGTCGGCCTTCGGCCCGTGCTTGGCGTGGCTCTCGGGGTGGCTAAGCTT 



7 RGDALRFGPVFRVS LGVSEF- 

GGCGTCGGTGTACTGCTCGCGGGTGAAGAGGCTGGAGGTCAGGACCTCGGAGAACTCTCT 

16021 + + + + + + 16080 

CCGCAGCCACATGACGAGCGCCCACTTCTCCGACCTCCAGTCCTGGAGCCTCTTGAGAGA 
7 ADTYQERTFLS STLVESFER- 

GAAGCCGGAGGCGTCCGCGACCCGGAACCGGACCTCCAGACGTGACTTGTCGCCCTGGCG 

16081 + + + + + + 16140 

CTTCGGCCTCCGCAGGCGCTGGGCCTTGGCCTGGAGGTCTGCACTGAACAGCGGGACCGC 
7 FGSADAVRFRVELRS KDGQR- 

CACGGAGTGCGTCATCCGCGTGATGACACGGCCCTCCTCCTGGTGCAGATGGCCGCCGAC 

16141 + + + + + + 16200 

GTGCCTCACGCAGTAGGCGCACTACTGTGCCGGGAGGAGGACCACGTCTACCGGCGGCTG 
7 VSHTMRTIVRGEEQHLHGGV- 

ATGC CCGTCGAGGAAGTTCTCGGGGAAATACCAGGGTTCGGCGACGAGGACTC CCC CGGG 

16201 + + + + + + 16260 

TACGGGCAGCTCCTTCAAGAGCCCCTTTATGGTCCCAAGCCGCTGCTCCTGAGGGGGCCC 
7 HGDLFNEPFYWPEAVLVGGP- 

GTTCAGGTGGTGGGCCATGGCCGACACCGCGGCCTTGAGCTCGGTGACGGACCCCATCTC 

16261 + + + + + + 16320 

CAAGTCCACCACCCGGTACCGGCTGTGGCGCCGGAACTCGAGCCACTGCCTGGGGTAGAG 
7 NLHHAMAS VAAKL ETVS GME- 

GCCGAGCGCGTTGCCCATGCAGGTGATCGCGTCGAAGGTGCGGCCCAGGTCGAACGAACG 

16321 + + + + + + 16380 

CGGCTCGCGCAACGGGTACGTCCACTAGCGCAGCTTCCACGCCGGGTCCAGCTTGCTTGC 
7 GLANGMCT IADFT RGLDFS R - 

CATGTCACCGGCGTGCAGCGGGACGCCGGGAAGCCGGCCCGCCGCCTGCTCCAGCATCGC 

16381 + + + + + + 16440 

GTACAGTGGCCGCACGTCGCCCTGCGGCCCTTCGGCCGGGCGGCGGACGAGGTCGTAGCG 
7 MDGAHL PVGPLRGAAQELMA- 

GGGCGCGTACTCGAGGCCCTCCACATGGCCGAAGAGCGTGGCGAGCGTCTCCAGATGGGC 

16441 + + + + + + 16500 

CCCGCGCATGAGCTCCGGGAGGTGTACCGGCTTCTCGCACCGCTCGCAGAGGTCTACCCG 
7 PAYELGEVHGFLTALTELHA- 

TCCGGTGCCGCAGGCGACGTCCAGGAGCGACACGGCGTCGGGGCGGGCGGCGAGGATCAG 

16501 + + + + + + 16560 

AGGCCACGGCGTCCGCTGCAGGTCCTCGCTGTGCCGCAGCCCCGCCCGCCGCTCCTAGTC 
7 GTGCAVDLLSVAD PRAAL I L- 

CTCGGTGAGCCCGCGGGCCTCCAGGTCGAAGTCCTTGCCGCGGCTGCGGAACACGAGGTC 

16561 + + + + + + 16620 

GAGCCACTCGGGCGCCCGGAGGTCCAGCTTCAGGAACGGCGCCGACGCCTTGTGCTCCAG 
7 ETLGRAELDFDKGRSRFVLD- 

GTAGAACTTCGCGTGCTCGGGGCCGTACTCCATCAGACGAGCTCCTTCGCAGACTGGGCG 

16621 + + + + + + 16680 

CATCTTGAAGCGCACGAGCCCCGGCATGAGGTAGTCTGCTCGAGGAAGCGTCTGACCCGC 
7-< YFKAHEPGYEM- 

6-* *VLEKASQA- 

GAGATGATTCTGGGCTCCGGGATGGGAACGATGAACTTCCCTCCCGCCTCCAGGAAGCGG 

16681 + + + + + + 16740 

CTCTACTAAGACCCGAGGCCCTACCCTTGCTACTTGAAGGGAGGGCGGAGGTCCTTCGCC 
6 SIIRPEPIPVIFKGGAELFR- 

CGCTCCTTGCGGACGACCTCGTCGGTGTAGTTCCAGGCGAGGAGGAGGTAGTAGTCCGGC 
16741 + + + + + + 16800 
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GCGAGGAACGCCTGCTGGAGCAGCCACATCAAGGTCCGCTCCTCCTCCATCATCAGGCCG 
6 REKRVVEDTYNWALLLYYDP- 

TCGGTGGCAGCGACCTCCTCCGGAGGAAGGACCGGGATGCGGTTCCCCGGCAGCAGTTTG 

16801 + + + + + + 16860 

AGCCACCGTCGCTGGAGGAGGCCTCCTTCCTGGCCCTACGCCAAGGGGCCGTCGTCAAAC 
6 ETAAVEEPPLVPIRNGPLLK- 

CCGTGCTTGAGGCTGGTGGTGTCGCCGCAGACGGTGATGTCCTGATCCGTCAGACCGCAG 

16861 + + + + + + 16920 

GGCACGAACTCCGACCACCACAGCGGCGTCTGCCACTACAGGACTAGGCAGTCTGGCGTC 
6 GHKLSTTDGCVTIDQDTLGC 

GCCATCAGCAACTGGGTCCCCTTGGACGGTGCTCCGTAGCCGGCCACGCGGTGGCCGTCC 

16921 + + + + + + 16980 

CGGTAGTCGTTGACCCAGGGGAACCTGCCACGAGGCATCGGCCGGTGCGCCACCGGCAGG 
6 AMLLQTGKS PAGYGAVRHGD 

GCGGCCAGACCGCGAACGAGCGTACGGATCGCTTCGGTCACGCGCGTCACCCGCTCGGCG 

16981 + + + + + + 17040 

CGCCGGTCTGGCGCTTGCTCGCATGCCTAGCGAAGCCAGTGCGCGCAGTGGGCGAGCCGC 
6 AALGRVLTRIAETVRTVREA- 

AACGCCCGGTAGGGGGCATCCGTCAGCAGTCCGCGCTCCTCCTCCAGGCCGAGCAGCGCC 

17041 + + + + + + 17100 

TTGCGGGCCATCCCCCGTAGGCAGTCGTCAGGCGCGAGGAGGAGGTCCGGCTCGTCGCGG 
6 FARYPADTLLGREEELGLLA- 

GCGACCGAGGGCTCCGGGACCCGTGCGGCCGACTCGCGCGCGGCGACGACCGCGATCGAA 

17101 + + + + + + 17160 

CGCTGGCTCCCGAGGCCCTGGGCACGCCGGCTGAGCGCGCGCCGCTGCTGGCGCTAGCTT 
6 AVSPEPVRAASERAAVVAIS- 

CCGCCGTGCACGGCGACCCGCTCCACGTCGATGATCCGCAGGCCGTGCGCGCCGAAGAGG 

17161 + + + + + + !7220 

GGCGGCACGTGCCGCTGGGCGAGGTGCAGCTACTAGGCGTCCGGCACGCGCGGCTTCTCC 
6 GGHVAVREVDI IRLGHAGFL- 

TGGCGCAGTGTGTGCAGGGAGAAGTACGACAGGTGCTCGTGGTAGATCGTGTCGAACTGG 

17221 + + + + + + 17280 

ACCGCGTCACACACGTCCCTCTTCATGCTGTCCACGAGCACCATCTAGCACAGCTTGACC 
6 HRLTHLSFYSLHEHYITDFQ- 

TTCTCGTCGAGCAGGTTCAGCAGGTACGGCACCTCGATGACCAGGACGCCGTCGTCGTCG 

17281 + + + + + + 17340 

AAGAGCAGCTCGTCCAAGTCGTCCATGCCGTGGAGCTACTGGTCCTGCGGCAGCAGCAGC 
6 NEDLLNLLYPVEIVLVGDDD 

AGCACTGCGTCGACGCCGTCCAGGATGCGGTGCACGTCGTCGATGTGCGCGAAGCACTGG 

17341 + + + + + + 17400 

TCGTGACGCAGCTGCGGCAGGTCCTACGCCACGTGCAGCAGCTACACGCGCTTCGTGACC 
6 LVADVGDL I RHVDD I HAFCQ- 

CGGCCGATGACGGCCTTGGCCCTGCCCTGCTCAAGGGCGATGCGGCCCGCGGGCTCCGGG 

17401 + + + + + + 17460 

GCCGGCTACTGCCGGAACCGGGACGGGACGAGTTCCCGCTACGCCGGGCGCCCGAGGCCC 
6 RGIVAKARGQELAI RGAPE P 

CCGAAGAAGTCCGGGTCCGTGGGGATCCCCCGGGCGTTGGCGATCTCGGCGAGGTTGGCC 

17461 + + + + + + 17520 

GGCTTCTTCAGGCCCAGGCACCCCTAGGGGGCCCGCAACCGCTAGAGCCGCTCCAACCGG 
6 GFFDPDTP IGRANAIEALNA- 

GCCGGGTCGACCCCGGCCACCCGCATGCCCGCCGCCCGGAACATCGCGAGCTGGGTGCCG 

175 2l + + + + + + 17580 

CGGCCCAGCTGGGGCCGGTGGGCGTACGGGCGGCGGGCCTTGTAGCGCTCGACCCACGGC 
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6 APDVGAVRMGAARFMALQTG- 

ACGTTGCTGCCCAGCTCCACGACCAGGTCGCCGGAGGCGAGGCTTGCCCGGCGGGTCGCC 

1758 1 + + + + + + 17640 

TGCAACGACGGGTCGAGGTGCTGGTCCAGCGGCCTCCGCTCCGAACGGGCCGCCCAGCGG 
6 VNSGLEVVLDGSALSARRTA- 

AGCCCGACGATGTGCGCCATGTGCTCGCGGATCTGGTCGGAGTCGGAGGAGACGTAGACG 

17641 + + + + + + 17700 

TCGGGCTGCTACACGCGGTACACGAGCGCCTAGACCAGCCTCAGCCTCCTCTGCATCTGC 
6 LGVIHAMHERIQDSDSSVYV- 

TAGTGCTTGAACAGTGTCCCGGGGTCGACGACATGGCGAAGCGTCATCAGCCGGCACGAC 

17701 + + + + + + 17760 

ATCACGAACTTGTCACAGGGCCCCAGCTGCTGTACCGCTTCGCAGTAGTCGGCCGTGCTG 
6 YHKFLTGPDVVHRLTMLRCS- 

CGGCACACGATGACGTCGAGCGGGAAGACGTCCTGCGCCTCATCGGCGTCGGCCGGATCG 

1776 1 + + + + + + 17820 

GCCGTGTGCTACTGCAGCTCGCCCTTCTGCAGGACGCGGAGTAGCCGCAGCCGGCCTAGC 
6 RCVIVDLPFVDQAEDADAPD 

ACGAACCCGTTGGCCAGCGGCAGCGAGCCGAAGGAGATCACCTCGGTCCAGTCGTCCGCA 

17821 -f + + + + + 17880 

TGCTTGGGCAACCGGTCGCCGTCGCTCGGCTTCCTCTAGTGGAGCCAGGTCAGCAGGCGT 
6 VFGNALPLSGFS IVETWDDA- 

CCGCATACACGGCACGTCTCGTCCCGCCTGCATTTCTCCAGCATGAAGTCTCCTGACGGC 

17881 + + + + + + 17940 

GGCGTATGTGCCGTGCAGAGCAGGGCGGACGTAAAGAGGTCGTACTTCAGAGGACTGCCG 
6-< GCVRCTEDRRCKELM- 

GAATGCCGACGCATCGGGCCCGTCGGTCCGGGGACGGTCAATCTAGGGTTCCGGCCGACG 

17 941 + + + + + + 18000 

CTTACGGCTGCGTAGCCCGGGCAGCCAGGCCCCTGCCAGTTAGATCCCAAGGCCGGCTGC 

GGCGCTCCACTTCGTATGTGCCCTACTGGTTCAGCGGAGCGGACGGGTGAACGCCCGTAC 

18001 + + + + + + 18060 

CCGCGAGGTGAAGCATACACGGGATGACCAAGTCGCCTCGCCTGCCCACTTGCGGGCATG 
17_* * RL PRT FARV- 

GTCCTCGATGAGGAGCTGCGGCTGCTCCATGGCCGCGAAGTGCCCGCCGCGGTCGAACTC 

180 61 + + + + + + 18120 

CAGGAGCTACTCCTCGACGCCGACGAGGTACCGGCGCTTCACGGGCGGCGCCAGCTTGAG 
17 DEILLQPQEMAAFHGGRDFE- 

GGTCCACCGCGTCAGGGTCGGCAGGATGCCCTCGGCGAACGACCGGATCGGCCGGGTGGC 

18121 + + + + + + 18180 

CCAGGTGGCGCAGTCCCAGCCGTCCTACGGGAGCCGCTTGCTGGCCTAGCCGGCCCACCG 
17 TWRTLTPLIGEAFSRI PRTA- 

GTCGTCCGGGAACACCGCGACGCCGACGGGGGCCGTCAGCGGCCAGGGCCCGCCCCAGGT 

1818i + + + + + + 18240 

CAGCAGGCCCTTGTGGCGCTGCGGCTGCCCCCGGCAGTCGCCGGTCCCGGGCGGGGTCCA 
17 DDPFVAVGVPATLPWPGGWT- 

GCGGGCGAAGTCCGCCATGCCGCGAGCCGACTCGTAGTACAACTGAGCGCTGGAACCGGC 

18241 + + + + + + 18300 

CGCCCGCTTCAGGCGGTACGGCGCTCGGCTGAGCATCATGTTGACTCGCGACCTTGGCCG 

17 RAFDAMGRAS EYYLQAS SGA- 

CGTCGCGGTCAGCCAGTAGATCATCACGTGGGTGAGCAGCCGGTCCCGGGAGATGGCCTC 

18301 + + + + + + 18360 

GCAGCGCCAGTCGGTCATCTAGTAGTGCACCCACTCGTCGGCCAGGGCCCTCTACCGGAG 
17 TATLWYIMVHTLLRDRS I A E - 
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CTCCACGTTCTTGCCGCCGCTCCACTCCTGGAACTTGTCGAGAATCCAGGCGAGCTGGCC 



18361 + + + + + + i0iiZU 

GAGGTGCAAGAACGGCGGCGAGGTGAGGACCTTGAACAGCTCTTAGGTCCGCTCGACCGG 

17 EVNKGGSWEQFKDL IWALQG- 

GACCGGGGAGTCGGTGAGGCCGTAGGCCAGGGTCTGCGGGCGGGTGGCCTGGATGCGCTG 

18421 + + + + + + 18480 

CTGGCCCCTCAGCCACTCCGGCATCCGGTCCCAGACGCCCGCCCACCGGACCTACGCGAC 
17 VPSDTLGYALTQPRTAQ IRQ- 

CCAGCCGATGCCGGTGTCGGCGAACTCCCCGCTGTGCGCCAGCTTGCCCAGGTCGCTCTC 

18481 + + + + + + 18540 

GGTCGGCTACGGCCACAGCCGCTTGAGGGGCGACACGCGGTCGAACGGGTCCAGCGAGAG 
17 WGIGTDAFEGSHALKGLDSE- 

GTCCAGGCGCCCGATGGCCTCCGGGGCGTCCTGGGGCGGGAAGGTCACCAGCATGTTCAG 

18541 + + + + + + 18600 

CAGGTCCGCGGGCTACCGGAGGCCCCGCAGGACCCCGCCCTTCCAGTGGTCGTACAAGTC 
17 DLRGIAEPADQPPFTVLMNL- 

GTGGACGCCGGCCACGTGCTCGGGGTCGGCCAGCCCCAGCTCCAGCGAGACGACCTTTCC 

18601 4- 4- + + + + 18660 

CACCTGCGGCCGGTGCACGAGCCCCAGCCGGTCGGGGTCGAGGTCGCTCTGCTGGAAAGG 
17 HVGAVHE PDALGLELSVVKG- 

CCAGTCGCCGCCCTGGGCGACGTAACGCTCGTAGCCGAGGCGGTTCATCAGCTCCGCCCA 

1 866 1 + + + + + + 18720 

GGTCAGCGGCGGGACCCGCTGCATTGCGAGCATCGGCTCCGCCAAGTAGTCGAGGCGGGT 
17 WDGGQAVYREYGLRNMLEAW- 

GGCGCGTGCGATCCGCCGCACGTCCCAGCCCGGCTCGGCAGTCGGGCCGGAGAAGCCGTA 

18721 + + + + + + 18780 

CCGCGCACGCTAGGCGGCGTGCAGGGTCGGGCCGAGCCGTCAGCCCGGCCTCTTCGGCAT 
17 ARAI RRVDWGPEATPGS FGY- 

GCCCGGCATGGAGGGGACGACGACGTGGAAGGCGTCCGCCGGGTCGCCGCCGTGCGCGCG 

1 8781 + + + + + + 18840 

CGGGCCGTACCTCCCCTGCTGCTGCACCTTCCGCAGGCGGCCCAGCGGCGGCACGCGCGC 
17 GPMS PVVVHFADAPDGGHAR- 

CGGGTCGCTCAGCGGCCCGATGACGTCGAGGAACTCGGCGACCGAGCCCGGCCAGCCGTG 

18841 + + + + + + 18900 

GCCCAGCGAGTCGCCGGGCTACTGCAGCTCCTTGAGCCGCTGGCTCGGGCCGGTCGGCAC 
17 PDSLPGIVDLFEAVSGPWGH- 

GGTGAGGATCAGCGGGATCGCGTCCGGCTCGGGCGAACGCACGTGAAGGAAGTGCACGTC 

18901 + + + + + + 18960 

CCACTCCTAGTCGCCCTAGCGCAGGCCGAGCCCGCTTGCGTGCACTTCCTTCACGTGCAG 
17 TLILPIADPEPSRVHLFHVD- 

GGCGCCGTCGATCGTGGTGACGAACTGGGGGAACGCGTTCAGCTCGGCCTCCGCGGCACG 

1896 i + + + + + + 19020 

CCGCGGCAGCTAGCACCACTGCTTGACCCCCTTGCGCAAGTCGAGCCGGAGGCGCCGTGC 
17 AGD I TTVFQPFANLEAEAAR- 

CCAGTCGTAGCCGTGGCGCCAGTGGTCGGTGAGCTCCTTGAGGTAGGACAGCGGCACTCC 

19021 + + + + + + 19080 

GGTCAGCATCGGCACCGCGGTCACCAGCCACTCGAGGAACTCCATCCTGTCGCCGTGAGG 
17 WDYGHRWHDTLEKLY S L PVG- 

GCGGTCCCATCCGGATCCGGGTATCTCGGACGGCCACCGGGTCGCGTCGATCCGCCGGGT 

19081 + + + + + + 19140 

CGQCAGGGTAGGCCTAGGCCCATAGAGCCTGCCGGTGGCCCAGCGCAGCTAGGCGGCCCA 
17 RDWGSGPI ESPWRTADIRRT- 



TAAGGTCGTCGAATGTCGGACTGGGTCGATCTCGATACGGAAGGGACGCACAGTGAATCC 



24 




19141 + + + + + + 19200 

ATTCCAGCAGCTTACAGCCTGACCCAGCTAGAGCTATGCCTTCCCTGCGTGTCACTTAGG 
17-< LTTSHRVPD I E I RFPRM- 

ACCCTCGTGATTGTGGGAGCGGGGCGGCGCGAGGCGGCCGCCCCGATGTGATCCGGGGAC 

19201 + + + + + + 19260 

TGGGAGCACTAACACCCTCGCCCCGCCGCGCTCCGCCGGCGGGGCTACACTAGGCCCCTG 

CGTGTCTCAGGCCGGTTCGGCCGGCGCGGCCGCGCCTTCCCGTGCGGAGAAGGACCGCAG 

19261 + + + + + + 19320 

GCACAGAGTCCGGCCAAGCCGGCCGCGCCGGCGCGGAAGGGCACGCCTCTTCCTGGCGTG 
16-* *AP EAPAAAGERAS FSRV- 

GGAGGACAGGAAGTTGCGGATCATCGGCATGCCGTGTTCGGTCCGGAAGCTCTCCGGATG 

19321 + + + + + + 19380 

CCTCCTGTCCTTCAACGCCTAGTAGCCGTACGGCACAAGCCAGGCCTTCGAGAGGCCTAC 
16 S SLFNRIMPMGHETRFSEPH- 

GAACTGGACGGACTCCACCGGCAGCGAACGGTGGCGCAGGCCCATCACGTACCCGTCGTC 

19381 + + + + + + 19440 

CTTGACCTGCCTGAGGTGGCCGTCGCTTGCCACCGCGTCCGGGTAGTGCATGGGCAGCAG 
16 F QVS EVPLS RHRL GMVYGDD- 

CGTGGAGCGCCCGGTGACCTCGAGGGACGGCGGGACCGTGCCCTCCGGCACGATCAGTGA 

19441 + + + + + + 19500 

GCACCTCGCGGGCCACTGGAGCTCCCTGCCGCCCTGGCACGGGAGGCCGTGCTAGTCACT 
16 TSRGTVELSPPVTGEPVILS- 

GTGGTAGCGGGTCGCGAAGAACCCCGCGGGCAGCCCGGTGAACACTCCGCGCCCGTCGTG 

19501 + + + + + + 19560 

CACCATCGCCCAGCGCTTCTTGGGGCGCCCGTCGGGCCACTTGTGAGGCGCGGGCAGCAC 
16 HYRTAFFGAPLGTFVGRGDH- 

CGTGATCCGGCTCGTCTTCCCGTGCATGAGATGCCGGGCGGGGACGGTGGCGGCGCCGTA 

19561 + + + + + + 19620 

GCACTAGGCCGAGCAGAAGGGCACGTACTCTACGGCCCGCCCCTGCCACCGCCGCGGCAT 
16 TIRSTKGHMLHRAPVTAAGY- 

GGCGCGGGCGACGGCCTGATGCCCCAGACAGACCCCGAGCAGCGGGACCCGGCCGGCGAA 

19621 + + + + + + 19680 

CCGCGCCCGCTGCCGGACTACGGGGTCTGTCTGGGGCTCGTCGCCCTGGGCCGGCCGCTT 
16 ARAVAQHGIiCVGLL PVRGAF- 

GGCCTGGACGATCTCGACGTGCCCGGAGGTGTCGGGGTGGCCGGGGCCCGGCCCCAGCAG 

19681 + + + + + + 19740 

CCGGACCTGCTAGAGCTGCACGGGCCTCCACAGCCCCACCGGCCCCGGGCCGGGGTCGTC 
16 AQVI EVHGSTDPHGPGPGLL- 

GACCGCGTCCGGCCGCATCAGCCCCATCTCGTCCGGGGTCATGAGATGCGACCGCACCAT 

19741 + + + + + + 19800 

CTGGCGCAGGCCGGCGTAGTCGGGGTAGAGCAGGCCCCAGTACTCTACGCTGGCGTGGTA 
16 VAD PRMLGMED P TMLH S RVM- 

GACGGGCTCCGCGCCGGCGGACATCAGATACTGGCGCAGGATGTCGACGAAGCTGTCGAA 

19801 + + + + + + 19860 

CTGCCCGAGGCGCGGCCGCCTGTAGTCTATGACCGCGTCCTACAGCTGCTTCGACAGCTT 
16 VPEAGASMLYQRLIDVFSDF- 

CGCGTCGACCACCAGGACCCGCGGGGCCTCGGTGCCTGCGCCGGATCCGTCGGGAGACCA 

19861 + + + + + + 19920 

GCGCAGCTGGTGGTCCTGGGCGCCCCGGAGCCACGGACGCGGCCTAGGCAGCCCTCTGGT 
16 ADVVLVRPAETGAGSGDPSW- 

CAAGCTCACAGCAACTCCTCTCCGGTGACCGCCCAGTGAGTGGCGCTCATCTTGGCCAGC 

19921 + + + + + + 19980 

GTTCGAGTGTCGTTGAGGAGAGGCCACTGGCGGGTCACTCACCGCGAGTAGAACCGGTCG 
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16-< L S M - 

15-* * LLEEGTVAWHTASMKAL 

GTCTCGGTCCACTCCGCCCCCGGTTCGGAATCGGCGACGATTCCGGCCGAGGCCCGGGTG 

19981 + + + + + + 20040 

CAGAGCCAGGTGAGGCGGGGGCCAAGCCTTAGCCGCTGCTAAGGCCGGCTCCGGGCCCAC 
15 TETWEAGPESDAVIGASART- 

CGGTAGACGCCCTCGTGGTGGAAAAGGGTCCGGATGCACAGCGCGAGGTTGGTGTACCCG 

20041 + + + + + + 20100 

GCCATCTGCGGGAGCACCACCTTTTCCCAGGCCTACGTGTCGCGCTCCAACCACATGGGC 
15 RYVGEHHFLTRI CLALNTYG 

CCCACGTCGAGGAGGCCGAGCGCCCCGGCGTACAGGCCGCGGCGGCTGCGTTCGACGGAC 

20101 + + + + + + 20160 

GGGTGCAGCTCCTCCGGCTCGCGGGGCCGCATGTCCGGCGCCGCCGACGCAAGCTGCCTG 
15 GVDLLGLAGAYLGRRSREVS- 

TCGATGATCTCCATGGCGCGGATCTTCGGCGCGCCCGTCATGGTGCCGGCGGGGAACAGG 

20161 + + + + + + 20220 

AGCTACTAGAGGTACCGCGCCTAGAAGCCGCGCGGGCAGTACCACGGCCGCCCCTTGTCC 
15 EI IEMARI KPAGTMTGAPFL 

GCGGCGATGGTGTCGAAGGCATCGGTGTCCACCCGCGCCCGGCCGACGACCGTGGAGACC 

20221 + + + + + + 20280 

CGCCGCTACCACAGCTTCCGTAGCCACAGGTGGGCGCGGGCCGGCTGCTGGCACCTCTGG 
15 AAITDFADTDVRARGVVTSV- 

AGGTGCAGCACGTGGGAGTAGCCCTCCACGTCCAGCTGGTCGGGTACGTCGAGCGTGTTC 

20281 + + + + + + 20340 

TCCACGTCGTGCACCCTCATCGGGAGGTGCAGGTCGACCAGCCCATGCAGCTCGCACAAG 
15 LHLVHSYGEVDLQDPVDLTN 

GGCCGGGCGATCCGTCCGATGTCGTTGCGGCAGAGGTCCACCAGCATGGTGTGCTCGGCG 

20341 + + + + + + 20400 

CCGGCCCGCTAGGCAGGCTACAGCAACGCCGTCTCCAGGTGGTCGTACCACACGAGCCGC 
15 PRAIRGIDNRCLDVLMTHEA- 

ATCTCCTTGGGATCCGACCTCAGCCGGACTCCCGCGGCGATGCCGCCGTCCGCGCCGGAC 

20401 + + + + + + 20460 

TAGAGGAAGCCTAGGCTGGAGTCGGCCTGAGGGCGCCGCTACGGCGGCAGGCGCGGCCTG 
15 IEKPDSRLRVGAAIGGDAGS 

CGCGGCACCGTGCCCGCGATCGGCCGCATCGTGACCTCGCCGTCCTCGATGCGTACGAAC 

20461 + + + + + + 20520 

GCGCCGTGGCACGGGCGCTAGCCGGCGTAGCACTGGAGCGGCAGGAGCTACGCATGCTTG 
15 RPVTGAI PRMTVEGDE I RVF 

AGCTCGGGGCTGGCGCCGATCAGACGGTGCCCGTCGATGCCCGCCAGATACATGTACGGG 

20521 + + + + + + 20580 

TCGAGCCCCGAGCGCGGCTAGTCTGCCACGGGCAGCTACGGGCGGTCTATGTACATGCCC 
15 LEPSAGILRHGDIGALYMYP- 

GAGGCGTTCCGCCCGCGCAGGCGCTGGTAGACGTCCGCGGGGTCGGCCGTCGAGCGGATG 

20581 + + + + + + 20640 

CTCCGCAAGGCGGGCGCGTCCGCGACCATCTGCAGGCGCCCCAGCCGGCAGCTCGCCTAC 
15 SANRGRLRQYVDAPDATSRI- 

GAGAGCTCGTGAC CGATCTGCACCTGGTAGATGTCGC CGACGGCGATGTGCTTCAGACAC 

20641 + + + + + + 20700 

CTCTCGAGCACTGGCTAGACGTGGACCATCTACAGCGGCTGCCGCTACACGAAGTCTGTG 
15 SLEHGIQVQYIDGVAIHKLC 

CGCTCGACGTCGTTCGCGAACACTTCGGGGGCGCTGTCGTCGGTGACCGCGGAGGCGGGG 

20701 + + + + + + 20760 

GCGAGCTGCAGCAAGCGCTTGTGAAGCCCCCGCGACAGCAGCCACTGGCGCCTCCGCCCC 
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15 REVDNAFVEPASDDTVASAP 

AAGCCGTCTGCGGACGGATCGGGCCAGGCCTGCTCCACGTCGGCGAGGAGCCCGGTGACG 

20761 + + + + + + 20820 

TTCGGCAGACGCCTGCCTAGCCCGGTCCGGACGAGGTGCAGCCGCTCCTCGGGCCACTGC 

15 FGDAS PD PWAQEVDALLGTV 

GTCTCCGGCGCGAGGCCGGGCCAGTACGGGGACTCGTGGAGCAGCAGTTCGCATCGGCCG 

20821 + + + + + + 20880 

CAGAGGCCGCGCTCCGGCCCGGTCATGCCCCTGAGCACCTCGTCGTCAAGCGTAGCCGGC 
15 TEPALGPWYPSEHLLLECRG- 

GTGGCGAGATCGGTGACCACGCTGCCCCGGTGCAGGACCATGCGTACGTCCGGCAGGCCA 

20881 + + + + + + 20940 

CACCGCTCTAGCCACTGGTGCGACGGGGCCACGTCCTGGTACGCATGCAGGCCGTCCGGT 

15 TALDTVVSGRHLVMRVDPLG- 

GGCCGGTTCTCGATGAGGTGGGGCAGGTCCTCGATGTAGCGGGCCGTGTCGTACCCGAAG 

20941 + + + + + + 21000 

CCGGCCAAGAGCTACTCCACCCCGTCCAGGAGCTACATCGCCCGGCACAGCATGGGCTTC 
15 PRNEILHPLDEIYRATDYGF 

AACCCGAGGAACCCGAAGCGGAAGCCGGACGCGGACCCCTCGGCGTCGAACATGTCCCGC 

21001 + + + + + + 21060 

TTGGGCTCCTTGGGCTTCGCCTTCGGCCTGCGCCTGGGGAGCCGCAGCTTGTACAGGGCG 
15 FGLFGFRFGSASGEADFMDR 

ATGGCCCGCAGCAGCGGCCACAACCCGCCCGCGGTACGCAGCCGCAGCCCCTGGGGGCCG 

21061 + + + + + + 21120 

TACCGGGCGTCGTCGCCGGTGTTGGGCGGGCGCCATGCGTCGGCGTCGGGGACCCCCGGC 
15 MARLLPWLGGATRLRLGQPG- 

TCCTCCAGGAGCGCGCCGGCCCGCTCCAGGAGCAGGCCCCGCAGGGCGGGTACGCCCTCG 

21121 + + + + + + 21180 

AGGAGGTCCTCGCGCGGCCGGGCGAGGTCCTCGTCCGGGGCGTCCCGCCCATGCGGGAGC 
15 DELLAGARELLLGRLAPVGE- 

ACGCGCACCACCCGGTCGGTGACCGAGAGCGAGAGCAGCGCGCCGAAGCCGACGAACTGG 

2H81 + + + + + + 21240 

TGCGCGTGGTGGGCCAGCCACTGGCTCTCGCTCTCGTCGCGCGGCTTCGGCTGCTTGACC 
15 VRVVRDTVSLSLLAGFGVFQ 

TGCCTGCGGTCGCGGGCCGGGCCGGCCGCGGACTCCAGGAGGTAGACCTCGTCGGGGCCG 

21241 + + + - + + + 21300 

ACGGACGCCAGCGCCCGGCCCGGCCGGCGCCTGAGGTCCTCCATCTGGAGCAGCCCCGGC 
15 HRRDRAPGAASELLYVEDPG- 

AAGTGCTCGGCCAGCGCGCGGTAGGCGGGCAGGGCGCCCGTCTCCTTCACATCGAGGCGT 

21301 + + + + + + 21360 

TTCACGAGCCGGTCGCGCGCCATCCGCCCGTCCCGCGGGCAGAGGAAGTGTAGCTCCGCA 
15 FHEALARYAPLAGTEKVDLR 

CGTGTCCGCACCCGCACCGGGGCCGAGACCACGCACTGGTCGGTCATCCTGGGTCCTCCC 

21361 + + + + + + 21420 

GCACAGGCGTGGGCGTGGCCCCGGCTCTGGTGCGTGACCAGCCAGTAGGACCCAGGAGGG 
15-< RTRVRVPASVVCQDTM- 

GGATCACGTGGTGATGGCGTAGCGGTGTGCCACCTGACGGGCGGTCAGCACCGCCCGGTC 

21421 + + + + + + 21480 

CCTAGTGCACCACTACCGCATCGCCACACGGTGGACTGCCCGCCAGTCGTGGCGGGCCAG 

14-* * TTIAYRHAVQRATLVARD- 

GGGGCCGGAGCGGTTGTCGACGACGCGCGCGGCCTTCCAGCTGACGAAGGAGCCGGTGTG 

21481 + + + + + + 21540 

CCCCGGCCTCGCCAACAGCTGCTGCGCGCGCCGGAAGGTCGACTGCTTCCTCGGCCACAC 
14 PGSRNDVVRAAKWSVFSGTH- 
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GGTCACGGGGTCGAGGTCGGTGTCCACGACGATGCCGGCGTGCGCGCCGGTCCGCTCCCT 

21541 + + + + + + 21600 

CCAGTGCCCCAGCTCCAGCCACAGGTGCTGCTACGGCCGCACGCGCGGCCAGGCGAGGGA 
14 TVPDLDTDVV I GAHAGTRER- 

GAGCCGGGCGGCGACGGCCTCGCCGATGCCCTGCCGTTCCCCCTCGGCGCCGGCCAGCAG 

21601 + + + + + + 21660 

CTCGGCCCGCCGCTGCCGGAGCGGCTACGGGACGGCAAGGGGGAGCCGCGGCCGGTCGTC 
14 LRAAVAEG I GQREGEAGALL- 

GTCCATGCGCACGGTGACGGCGTCGCTGCCGTCGTCCTGCCGGTCGATGACGACCTGGTA 

21661 + + + + + + 21720 

CAGGTACGCGTGCCACTGCCGCAGCGACGGCAGCAGGACGGCCAGCTACTGCTGGACCAT 
14 DMRVTVADSGDDQRDIVVQY- 

GCCGAGGCAGCCGCCGACCCCGTCGAGGATCGCGGCCTCCAGCTCGGCGGGCTGGAGGGT 

21721 + + + + + + 21780 

CGGCTCCGTCGGCGGCTGGGGCAGCTCCTAGCGCCGGAGGTCGAGCCGCCCGACCTCCCA 
14 GLCGGVGDLIAAELEAPQLT- 

CACGTCGCCCAGGGGGATGCGGTCCGCGACCCGGCCGATGACCTGGATCCGCGGTCCCGG 

21781 + + + + + + 21840 

GTGCAGCGGGTCCCCCTACGCCAGGCGCTGGGCCGGCTACTGGACCTAGGCGCCAGGGCC 
14 VDGLP I RDAVRG I VQ I RPGP- 

CAGCGGCTCCCCGGGGCCCGCCGGGAGGATGCGGACCAGGTCCCCGGTGCGGTAGCGGAT 

21841 + + + + + + 21900 

GTCGCCGAGGGGCCCCGGGCGGCCCTCCTACGCCTGGTCCAGGGGCCACGCCATCGCCTA 
14 LPEGPGAPLIRVLDGTRYRI- 

CAGTGGTTTGATGCCGTCCACCAGCATGGTGAGGACGAGTTCGCCCTCTCCCGTGTCGCC 

21901 + + + + + + 21960 

GTCACCAAACTACGGCAGGTGGTCGTACCACTCCTGCTCAAGCGGGAGAGGGCACAGCGG 
14 LPKIGDVLMTLVLEGEGTDG- 

GACCACGGCGCCGGTGTCCGGTTCGACGAGTTCGGTCAAGTAGTTGGGCTGGGCGAGGTG 

21961 + + + + + + 22020 

CTGGTGCCGCGGCCACAGGCCAAGCTGCTCAAGCCAGTTCATCAACCCGACCCGCTCCAC 
14 VVAGTDPEVLETLYNPQALH- 

GAGCGCTCCGGTGTCCGCTCCGGTGGCGATGCACAGGGCTTCCTGGGAGCCGTAGAGCGT 

22021 + + + + + + 22080 

CTCGCGAGGCCACAGGCGAGGCCACCGCTACGTGTCCCGAAGGACCCTCGGCATCTCGCA 
14 LAGTDAGTAI CLAEQSGYLT- 

GGGCCGCACGACGGCTTGCGGCCAGAGGGTCGCCACGTTGTCGGCGAACTGCGGGGTGCA 

22081 + + + + + + 22140 

CCCGGCGTGCTGCCGAACGCCGGTCTCCCAGCGGTGCAACAGCCGCTTGACGCCCCACGT 
14 PRVVAQPWLTAVNDAFQ PTC- 

GATCTCACCCAGCGTGAGGAAGAGCTTCACGGGAAGCCGGGC CAGGT CGTAGCCGTAGTG 

22141 + + + + + + 22200 

CTAGAGTGGGTCGCACTCCTTCTCGAAGTGCCCTTCGGCCCGGTCCAGCATCGGCATCAC 
14 IEGLTLFLKVPLRALDYGYH- 

CAGGGCCGCCTTGGCAAGGCTCAGGCACAGCGCCGGAGCACAGACGACGACCTCGACCTC 

22201 + + + + + + 22260 

GTCCCGGCGGAACCGTTCCGAGTCCGTGTCGCGGCCTCGTGTCTGCTGCTGGAGCTGGAG 
14 LAAKALS LCLAPACVVVEVE- 

CAGCTCCTCGATCAGCCGCAGCGCCTTACGGAATCCCACCCTGGGGGACTCGGGCCAGAT 

22261 + + + + + + 22320 

GTCGAGGAGCTAGTCGGCGTCGCGGAATGCCTTAGGGTGGGACCCCCTGAGCCCGGTCTA 
14 LEEILRLAKRFGVRPSEPWI- 
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CTTGACGTGACAGGCCCCCAGCTCCGCTGCCACCGCGGTGAACACGTCCCCGAACGCGTA 

22321 + + + + + + 22380 

GAACTGCACTGTCCGGGGGTCGAGGCGACGGTGGCGCCACTTGTGCAGGGGCTTGCGCAT 
14 KVH CAGL EAAVAT FVDG F A Y - 

CAGCTCCGACGGCCCCATCAGGCCCACGACGGGCATCCGCCCCCCGAACCTCGCTTCCAG 

22381 + + + + + + 22440 

GTCGAGGCTGCCGGGGTAGTCCGGGTGCTGCCCGTAGGCGGGGGGCTTGGAGCGAAGGTC 
14 LESPGMLGVVPMRGGFRAEL- 

CATGCGGCGCCAGGACTCCCGGACGGCGATGTTGCTGGTCGCGATGTCCTTCTCGCCGCG 

22441 + + + + + + 22500 

GTACGCCGCGGTCCTGAGGGCCTGCCGCTACAACGACCAGCGCTACAGGAAGAGCGGCGC 
14 MRRWSERVAINSTAIDKEGR- 

TGGGCACGGGGTGGCCGCCCCGGTGGTCCCGGTGGTCTCGTAGTAGATGCGTGCTTCGTG 

22501 + + + + + + 22560 

ACC CGTGC CCCAC CGGCGGGGCCACCAGGGC CAC CAGAGCATCATCTACGCACGAAGCAC 
14 PCPTAAGTTGTTEYY I R A E H - 

CAGCGGGCCCGACAGGACGTCGTGCATCTCCCGCCGCAGGTCGTCCTTGGTGGTGAAGGG 

22561 + + + + + + 22620 

GTCGCCCGGGCTGTCCTGCAGCACGTAGAGGGCGGCGTCCAGCAGGAACCACCACTTCCC 
14 LPGSLVDHMERRLDDKTTFP- 

CAGGTCCGCCAGGTTCGCGGGGGTGACGGCCTCGACGTCCACGCCTGCCAGATGGCGGCG 

22621 + + + + + + 22680 

GTCCAGGCGGTCCAAGCGCCCCCACTGCCGGAGCTGCAGGTGCGGACGGTCTACCGCCGC 
14 LDALNAPTVAEVDVGALHRR- 

GTAGAACGGCGAGCGGCGGGTGACGTGGCGCAGTACGGCCGTCAGCCGTTCGCCCTCCCA 

22681 + + + + + + 22740 

CATCTTGCCGCTCGCCGCCCACTGCACCGCGTCATGCCGGCAGTCGGCAAGCGGGAGGGT 
14 YFPS RRTVHRLVATLREGEW- 

GCGCTCGCGGTCGGCGGCGGTGAGTTCGCCGCGGTAGAACGCGTCGCTCACCTGCCCGTA 

22741 + + + + + + 22800 

CGCGAGCGCCAGCCGCCGCCACTCAAGCGGCGCCATCTTGCGCAGCGAGTGGACGGGCAT 
14 RERDAATLEGRYFAD SVQGY- 

GGCGGACCAGAACTCGCTGTCCGCGTCGGGGTCCAGCGGCCCGGTCCCGCCGGGACCGGG 

22801 + + + + + + 22860 

CCGCCTGGTCTTGAGCGACAGGCGCAGCCCCAGGTCGCCGGGCCAGGGCGGCCCTGGCCC 
14 ASWFESDADPDLPGTGGPGP- 

CCGCCGGCCGTCTCTCACGGCTGTGCCTGGAGTTCGTTGAGCGCGAGGCCGACCCGCTCG 

22861 + + + + + + 22920 

GGCGGCCGGCAGAGAGTGCCGACACGGACCTCAAGCAACTCGCGCTCCGGCTGGGCGAGC 

14 -< RRGDRM- 

21-* * PQAQLENLALGVRE - 

TTGACCTCGTTGGAGGCCAGCACGTCCGAACGGCCGGTGAGCCGACGGTGTTCGTCGAGC 

22921 + + + + + + 22980 

AACTGGAGCAACCTCCGGTCGTGCAGGCTTGCCGGCCACTCGGCTGCCACAAGCAGCTCG 
21 NVENSALVDSRGTLRRHEDL 

AGTTCGATCATGTCCGTCATCCTCTCGACCAGGCGCGAGACGTTGGTGAGGCCCTCCTCG 

22981 + + + + + + 23040 

TCAAGCTAGTACAGGCAGTAGGAGAGCTGGTCCGCGCTCTGCAACCACTCCGGGAGGAGC 
21 LEIMDTMREVLRSVNTLGEE 

TCCTTGAGCGCGTCGCCCCGGTGCAGCGCGTGCACCGTCGCCGGGAAGCCGCTGCCCACC 

23041 + + + + + + 23100 

AGGAACTCGCGCAGCGGGGCCACGTCGCGCACGTGGCAGCGGCCCTTCGGCGACGGGTGG 
21 DKLADGRHLAHVTAPFGSGV 
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AGGATCATCCGGTTGAGCAGGGCATTGACGGTCAGCTGAGCCCATACCTCGCCGGCGCTG 

23101 + + + + + + 23160 

TCCTAGTAGGCCAACTCGTCCCGTAACTGCCAGTCGACTCGGGTATGGAGCGGCCGCGAC 
21 LIMRNLLANVTLQAWVEGAS- 

TAGCGGCGGGCGACCGAGATGATCCCCGCGACCTTGTTGCTCAGCGGCCGGTCGAAGCGC 

23i5i + + + + + + 23220 

ATCGCCGCCCGCTGGCTCTACTAGGGGCGCTGGAACAACGAGTCGCCGGCCAGCTTCGCG 
21 YRRAVS I IGAVKNSLPRDFR- 

AGATAACCGACTCCGGCACGCTCGATGAAGGTCTGCATGAGGCTGGCCGTGCCGAATCCG 

23221 i — — i 1 1 1 h 23280 

TCTATTGGCTGAGGCCGTGCGAGCTACTTCCAGACGTACTCCGACCGGCACGGCTTAGGC 
21 LYGVGARE I FTQMLSATGFG 

TGCACGGGCGCCGCGAAGATGATCCCGTCCGCCGCGACCATCTTCGCCACGACCTCGGGC 

2328I + + + + + + 23340 

ACGTGCCCGCGGCGCTTCTACTAGGGCAGGCGGCGCTGGTAGAAGCGGTGCTGGAGCCCG 
21 HVPAAFI IGDAAVMKAVVEP 

ACCCCGTCGGCCAGGGTGCAGGCCACCGGCCTGTCGTTGCAGTCCCCGCAGGGCCCGCAC 

23341 + + + + + + 23400 

21 VGDALTCAVPRDNCDGCPGC 

CGCTCCATCCTGATCGAGCGCAGGTCGACGGCCTCGAAGTCGACGCCGCGGTTCTCTGCT 

23401 + + + + + + 23460 

GCGAGGTAGGACTAGCTCGCGTCCAGCTGCCGGAGCTTCAGCTGCGGCGCCAAGAGACGA 

21 REMRI SRLDVAEFDVGRNEA- 

ACGCGTGCCGCGTGCCGCAGTACGTCGGCGGTGTTGCCGTCACGTTCCGAACCGTTGATC 

234 6 i + + + + + + 23520 

TGCGCACGGCGCACGGCGTCATGCAGCCGCCACAACGGCAGTGCAAGGCTTGGCAACTAG 
21 VRAAHRLVDATNGDRESGNI 

GCGAGGATCTTGAGTTGTGCGCTCACGAGGGGCCTCCTTGGTGAGTCAGGTGCGCTCGGC 

23521 + + + + + + 23580 

CGCTCCTAGAACTCAACACGCGAGTGCTCCCCGGAGGAACCACTCAGTCCACGCGAGCCG 
!3_* * T R E A - 

21-< ALIKLQASM- 

GGTCGGCTCGGGGGAACTGTCTGGCCGCCGCTGGTCCGGGAGCCGCAGGGCCGGCTCGGC 

23581 + + + + + + 23640 

CCAGCCGAGCCCCCTTGACAGACCGGCGGCGACCAGGCCCTCGGCGTCCCGGCCGAGCCG 

13 TPEPSSDPRRQDPLRLAPEA- 

GGGGGCGGGAGGAAGACCGCCCCGCGGCGGGCCGCCACGCTCGCCGAACCGGATGAGGGG 

23641 + + + + + + 23700 

CCCCCGCCCTCCTTCTGGCGGGGCGCCGCCCGGCGGTGCGAGCGGCTTGGCCTACTCCCC 

13 PAP PLGGRPPGGREGFRI LP- 

CTTCTCGACGAGATAGAAGCTGATGGTCGCCAGCACGACGCTGATCGAGATCGTGAAGAG 

23701 i | | 1 1 + 23760 

GAAGAGCTGCTCTATCTTCGACTACCAGCGGTCGTGCTGCGACTAGCTCTAGCACTTCTC 
13 KEVLYFS ITALVVS I S ITFL- 

GAACAGTTCCCAGAACCCCATGTCACCCCGGAATTCCGGCGTTGGCACGGGAGACTTGCC 

237 6 i + + + + + + 23820 

CTTGTCAAGGGTCTTGGGGTACAGTGGGGCCTTAAGGCCGCAACCGTGCCCTCTGAACGG 
13 FLEWFGMDGRFE PTPVPSKG- 

GAAGATGCTGCCGTTCCTGAGCCAGAGGTTGATCACGATCTCGTGCCAGAGGTAGACGCC 

23821 + + + + + + 23880 

CTTCTACGACGGCAAGGACTCGGTCTCCAACTAGTGCTAGAGCACGGTCTCCATCTGCGG 
13 FISGNRLWLNIVIEHWLYVG- 

GAGGGAGATCTGGCCGAGGAAGAGGATCGGCTTGCTGGTGAAGAGCGCGTCCGAGAACCG 
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23881 + + + + + + 23940 

CTCCCTCTAGACCGGCTCCTTCTCCTAGCCGAACGACCACTTCTCGCGCAGGCTCTTGGC 
13 LS IQGLFLI PKSTFLADS F R - 

GGACTCGGCGCCGGGGACCGTCATCGGTGCCAGGAGCAGCAGGGTGAAGGAGGTCAGGAT 

23941 + + + + + + 24000 

CCTGAGCCGCGGCCCCTGGCAGTAGCCACGGTCCTCGTCGTCCCACTTCCTCCAGTCCTA 
13 S EAGPVTMPALLLLTF STL I - 

GAAGTGGTCGACGAGCTCCTGGGCCAGGGCCGCGTTGTCGCCCATGCCCGGGATGCCGAT 

24001 + + + + + + 24060 

CTTCACCAGCTGCTCGAGGACCCGGTCCCGGCGCAACAGCGGGTACGGGCCCTACGGCTA 
13 FHDVLEQALAANDGMGP I G I - 

GGGCTTGGTGGCGTAGAGGAGGTACAGCGGGATGAGCGGGACCCAGCAGATCAGCGGGCG 

24061 + + + + + + 24120 

CCCGAACCACCGCATCTCCTCCATGTCGCCCTACTCGCCCTGGGTCGTCTAGTCGCCCGC 
13 PKTAYliLYLP ILPVWCILPR- 

CCGGATCACGAAACGGTAGAAGCCCGGGGTCCCTGGCGTCGCCTCGGCGTACGCGGAGTA 

24121 + + + + + + 24180 

GGCCTAGTGCTTTGCCATCTTCGGGCCCCAGGGACCGCAGCGGAGCCGCATGCGCCTCAT 
13 RIVFRYFGPTGPTAEAYASY- 

GATGGCCAGTGCCATGCCCGCGGCGAAGCAGCCGGCGTAGTAGGGCGGCCAGTACCACTG 

24181 + + + + + + 24240 

CTACCGGTCACGGTACGGGCGCCGCTTCGTCGGCCGCATCATCCCGCCGGTCATGGTGAC 
13 IALAMGAAFCGAYY P PWYWQ- 

CATCGTCGCGCCGGTGGAGGGGAGGTTGGTGTACGTGACCCAGCCGATGGCCATGACTTC 

24241 + + + + + + 24300 

GTAGCAGCGCGGCCACCTCCCCTCCAACCACATGCACTGGGTCGGCTACCGGTACTGAAG 
13 MTAGTS PLNTYTVWG IAMVE- 

CAGCGCGGCCAGCGGCAGCAGGAGGCGGCGTGCCTTCTGCCCGGGAGTGCTGCCGCCCCG 

24301 + + + + + + 24360 

GTCGCGCCGGTCGCCGTCGTCCTCCGCCGCACGGAAGACGGGCCCTCACGACGGCGGGGC 
13 LAAL PLLLRRAKQGPT SGGR- 

CGCGAGCCGGTGGCCGATCCAGGCGATCAGCGGCAGGGCGAGGTAGAACGTGAACTCGGC 

24361 + + + + + + 24420 

GCGCTCGGCCACCGGCTAGGTCCGCTAGTCGCCGTCCCGCTCCATCTTGCACTTGAGCCG 
13 ALRHGIWAILPLALYFTFEA- 

GGGGACCGTCCAGGTGGGCTCGATGCCGTGCATCGGCTGGCCCTCGGGCAGATAGAAGTG 

24421 + + + + + + 24480 

CCCCTGGCAGGTCCACCCGAGCTACGGCACGTAGCCGACCGGGAGCCCGTCTATCTTCAC 
13 PVTWTPE IGHMPQGEPLYFH- 

CATGAGCAGCACGGGCCGCAGGACGTCGCTGACGCTGTCGATCTCGAACCAGTTGTAGCC 

24481 + + + + + + 24540 

GTACTCGTCGTGCCCGGCGTCCTGCAGCGACTGCGACAGCTAGAGCTTGGTCAACATCGG 
13 MLLVPRLVDSVSDIEFWNYG- 

GGGGATTGCG7^AGACGAGCAACAGGTAGTAGGCGGGCAGGATGCGCAGGGCCCGGCGTTT 

24541 + + + + + + 24600 

CCCCTAACGCTTCTGCTCGTTGTCCATCATCCGCCCGTCCTACGCGTCCCGGGCCGCAAA 
13 P IAFVLLLYYAPLI RLARRK- 

GAGGAACCGTCCGGTGGCGGGCCGCTTCGTCCCACTGATGGTGACGCGGGCGTAGGGCTT 

24601 + + + + + + 24660 

CTCCTTGGCAGGCCACCGCCCGGCGAAGCAGGGTGACTACCACTGCGCCCGCATCCCGAA 
13 LFRGTAPRKTGS I TVRAYPK- 

GTACAGCATCATTCCGGACAGAGCGAAGAAGGGGGAAGGCATACCCCCAGACCGTCCGCG 
24661 + + + + + + 24720 
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CATGTCGTAGTAAGGCCTGTCTCGCTTCTTCCCCCTTCCGTATGGGGGTCTGGCAGGCGC 
13 -< YLMMGSLAFFPSPM- 

AGGACGCCCCAGAACGGTTTGCCCGGCTCACCGACGAAGCTGCCCACTCCGGCCTGGAAG 

24721 + + + + + + 24780 

TCCTGCGGGGTCTTGCCAAACGGGCCGAGTGGCTGCTTCGACGGGTGAGGCCGGACCTTC 

GCGACGTGGTAGACGACCACACCCAGCGCGAGGACACCTCGCAGTCCCTCGAACTTCGGT 

24781 + + + + + + 24840 

CGCTGCACCATCTGCTGGTGTGGGTCGCGCTCCTGTGGAGCGTCAGGGAGCTTGAAGCCA 

ATTCGCTTGCTTTTTGCGCCACCTGCGTCGCGAAGGACGTCCCCCATGGAACAGTCCCCT 

24841 + + + + + + 24900 

TAAGCGAACGAAAAACGCGGTGGACGCAGCGCTTCCTGCAGGGGGTACCTTGTCAGGGGA 

TTCCCTTGGCACTTGCTCGTTGACTTCCCGAAATAGTCGGGTCTGCGGAGTGTGAGCCGC 

24901 + + - + + + + 24960 

AAGGGAACCGTGAACGAGCAACTGAAGGGCTTTATCAGCCCAGACGCCTCACACTCGGCG 

ATCTCCAATCGTGCTGTTCCGGTGCTCAGGACGACTTGTTTCGGCCTGAGTGGGAAGGCA 

24961 + + + + + + 25020 

TAGAGGTTAGCACGACAAGGCCACGAGTCCTGCTGAACAAAGCCGGACTCACCCTTCCGT 
12-* *SSKNRGSHSP 

GCCACCCCCGCCGCCCCGCCTCGGCCAGACCGGGGGCCGAGGAGTCCCGTTCCGAGAGGA 

25021 + + + + + + 25080 

CGGTGGGGGCGGCGGGGCGGAGCCGGTCTGGCCCCCGGCTCCTCAGGGCAAGGCTCTCCT 
12 LWGRRGAEALGPASSDRESL 

TCGGAGTGATCTCCGGCGGCCAGGCGATGCCCACCTCCGGATCCAGCGGATTCAAGCCAT 

25081 + + + + + + 25140 

AGCCTCACTAGAGGCCGCCGGTCCGCTACGGGTGGAGGCCTAGGTCGCCTAAGTTCGGTA 
12 IPTIEPPWAIGVEPDLPNLG 

GTTCGAGCCGGGGGTCGTAGGCCGCCGAGCACAGGTAGACGATCACCGCCTCGTCGCTCA 

25141 + + + + + + 25200 

CAAGCTCGGCCCCCAGCATCCGGCGGCTCGTGTCCATCTGCTAGTGGCGGAGCAGCGAGT 
12 HELRPDYAASCLYVIVAEDS 

GCGTGAGGAATCCGAAGCCCAGCCCCGCGGAGACGTACAGCGCCCGTCCGTTCTCCTCGC 

25201 + + + + + + 25260 

CGCACTCCTTAGGCTTCGGGTCGGGGCGCCTCTGCATGTCGCGGGCAGGCAAGAGGAGCG 
12 LTLFGFGLGASVYLARGNEE 

CGAGCTCCACGGTCCGCCAGCCGCCGAAGGTGGGCGACCCCACCCGGATGTCGACCACGG 

25261 + + + + + + 25320 

GCTCGAGGTGCCAGGCGGTCGGCGGCTTCCACCCGCTGGGGTGGGCCTACAGCTGGTGCC 
12 GLEVTRWGGFTPSGVRI DVV 

CGCCGAACACGCTGCCGCGCAGGCAGCTGAAGTACTTGGCCTGGCCGGGTACGCCCCCGG 

25321 + + + + + + 25380 

GCGGCTTGTGCGACGGCGCGTCCGTCGACTTCATGAACCGGACCGGCCCATGCGGGGGCC 
12 AGFVSGRLCSFYKAQGPVGG 

CGAAGTGGATGCCCCGCAGCACCCCGTGGGAGGAGATCGCGCAGTTCGCCTGCCGCAGGT 

25381 + + + + + + 25440 

GCTTCACCTACGGGGCGTCGTGGGGCACCCTCCTCTAGCGCGTCAAGCGGACGGCGTCCA 
12 AFHIGRLVGHSSIACNAQRL 

CGAAGGAGTGGCCTACGGTGCGGCGGAAGGGCTCGCCCTGGAACCACTCGCGAAACGAGC 

25441 + + + + + + 25500 

GCTTCCTCACCGGATGCCACGCCGCCTTCCCGAGCGGGACCTTGGTGAGCGCTTTGCTCG 
12 DFSHGVTRRFPEGQFWERFS 

CCCGTTCGTCACGGAAGACCTGCTTCTCCTCCGTCCACGCTCCCGAGATCCCGATCGGCT 
25501 + + + + + + 25560 
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GGGCAAGCAGTGCCTTCTGGACGAAGAGGAGGCAGGTGCGAGGGCTCTAGGGCTAGCCGA 
GREDRFVQKEETWAGS I GI P 

TCATCGCTGGCCCCTTCTCTCGACTTCTCTCGACGACTCGCGGGAGGCGGCCGAGGGGTC 

25561 + + + + + + 25620 

AGTAGCGACCGGGGAAGAGAGCTGAAGAGAGCTGCTGAGCGCCCTCCGCCGGCTCCCCAG 

< K M - 

CGCCGGGCCCGTGGGAACGCCGCAGTCTAGATGCGGCGGCACCGGGGGCAGGGGGGTGCG 

25621 + + + + + + 25680 

GCGGCCCGGGCACCCTTGCGGCGTCAGATCTACGCCGCCGTGGCCCCCGTCCCCCCACGC 

GACGACGTCCGCCCCACCTCAGCACACCGGGAGATGCAGGTCGGTGACGGGCGACGTGAC 

25681 + + + + + + 25740 

CTGCTGCAGGCGGGGTGGAGTCGTGTGGCCCTCTACGTCCAGCCACTGCCCGCTGCACTG 

GATGCAACGGTCCGAGGCCCGGTTGCCCGGACGACGGCCCACAGAGCCATCGGAGCAACG 

25741 + + + + + + 25800 

CTACGTTGCCAGGCTCCGGGCCAACGGGCCTGCTGCCGGGTGTCTCGGTAGCCTCGTTGC 

GAGGCGGACCGCAGATGACCAAGCACGCCCGTGACCGCGCGGTAGTCCTCGGCGCAGGGA 

25801 + + + + + + 25860 

CTCCGCCTGGCGTCTACTGGTTCGTGCGGGCACTGGCGCGCCATCAGGAGCCGCGTCCCT 
.> MTKHARDRAVVLGAGM- 

TGGCGGGGCTGCTCGCCGCGCGCGTCCTGTCCGAGACGTACAAGGAAGTGCTGGTGATCG 

25861 + + + + + + 25920 

ACCGCCCCGACGAGCGGCGCGCGCAGGACAGGCTCTGCATGTTCCTTCACGACCACTAGC 
AGLLAARVLSETYKEVLVID- 

ACCGGGACCGGTTGGGCGGCACGGAGCAGCGCCGCGGTGTCCCGCACGGACGCCACGCCC 

25921 + + + + + + 25980 

TGGCCCTGGCCAACCCGCCGTGCCTCGTCGCGGCGCCACAGGGCGTGCCTGCGGTGCGGG 
RDRLGGTEQRRGVPHGRHAH- 

ATGCGCTGCTGGCCAAGGGACAGCAGATCCTCAACGAACTCTTCCCCGGACTCGACACCG 

2598I + + + + + + 26040 

TACGCGACGACCGGTTCCCTGTCGTCTAGGAGTTGCTTGAGAAGGGGCCTGAGCTGTGGC 
ALLAKGQQI LNELFPGLDTE- 

AACTCACCTCGGCCGGAATCCCCGCCGGGGACATCGCCGGGAACCTGCGGTGGTACTTCA 

26041 + + + + + + 26100 

TTGAGTGGAGCCGGCCTTAGGGGCGGCCCCTGTAGCGGCCCTTGGACGCCACCATGAAGT 

LTSAGI PAGD IAGNLRWYFN- 

ACGGCCGCCGGCTCCAGCCCTTCGACACCGGGCTGATCAGCGTCTCGGCGACGAGGCCCG 

26101 + + + + + + 26160 

TGCCGGCGGCCGAGGTCGGGAAGCTGTGGCCCGACTAGTCGCAGAGCCGCTGCTCCGGGC 
GRRLQPFDTGLI SVSATRPE- 

AGCTGGAGTCCCACGTGCGCGCACGGGTCGCCGCGCTGCCACAGGTGAAGATCATGGACG 

26161 + + + + + + 26220 

TCGACCTCAGGGTGCACGCGCGTGCCCAGCGGCGCGACGGTGTCCACTTCTAGTACCTGC 
L E SHVRARVAAL PQVKI MDG- 

GGTGCGTGATCCGGGGCCTGACCGCCTCGGCCGACCGCAGCCGCGTCACCGGTGTCGAGG 

26221 j | — | -\ — — i — — h 26280 

CCACGCACTAGGCCCCGGACTGGCGGAGCCGGCTGGCGTCGGCGCAGTGGCCACAGCTCC 
CV I RGLTASADRS RVTGVEV- 

TGGTCGACGAGTCGGGTACGGACACCCCGACGCGCCTGGAGGCCGACCTCGTCGTCGACG 

26281 + + + + + + 26340 

ACCAGCTGCTCAGCCCATGCCTGTGGGGCTGCGCGGACCTCCGGCTGGAGCAGCAGCTGC 

VDESGTDTPTRLEADLVVDV- 
TCACGGGGCGCGGCTCGCGGACTCCCGCCTGGCTGGAGGAGTTCGGATACGAGCGGCCCG 
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26341 + + + + + + 26400 

AGTGCCCCGCGCCGAGCGCCTGAGGGCGGACCGACCTCCTCAAGCCTATGCTCGCCGGGC 
20 TGRGSRTPAWLEEFGYERPA- 

CGGAGGACCGCTTCAAGATCGATCTGGCGTACACCACGCGCCACTTCAAGCTCAAGGAAG 

26401 + + + + + + 26460 

GCCTCCTGGCGAAGTTCTAGCTAGACCGCATGTGGTGCGCGGTGAAGTTCGAGTTCCTTC 
20 EDRFKIDLAYTTRHFKLKED- 

ACCCCTACGGCACGGACCTGTCGATCAACCCGGTGGCATCGCCGAGCAACCCGCGCGGCG 

26461 + + + + + + 26520 

TGGGGATGCCGTGCCTGGACAGCTAGTTGGGCCACCGTAGCGGCTCGTTGGGCGCGCCGC 
20 PYGTDLS I NPVAS PSNPRGA- 

CGTTCTTCCCCCGGCTCGCGGACGGCAGCTCCCAGCTCTCCCTCACCGGAATCCTCGGCG 

26521 + + + + + + 26580 

GCAAGAAGGGGGCCGAGCGCCTGCCGTCGAGGGTCGAGAGGGAGTGGCCTTAGGAGCCGC 
20 FFPRLADGS SQLSLTG I LGD- 

ACCACCCGCCCACCGACGACGAGGGCTTCCTGGCGTTCGCCAAGTCGCTTGCCGCGCCGG 

26581 + + + + + + 26640 

TGGTGGGCGGGTGGCTGCTGCTCCCGAAGGACCGCAAGCGGTTCAGCGAACGGCGCGGCC 
20 HP PTDDEGFLAFAKS LAAPE- 

AGATCTACCGGGCCGTCCGCGATGCCGAACCTCTCGACGAACCGGTCACCTTCCGCTTCC 

26641 + + + + + + 26700 

TCTAGATGGCCCGGCAGGCGCTACGGCTTGGAGAGCTGCTTGGCCAGTGGAAGGCGAAGG 
20 I YRAVRDAEPLDEPVTFRFP- 

CGGCGAGCGTCCGCCGCCGTTACGAGAGGCTGCGCCGTTTCCCCGGCGGGTTCCTCGTCA 

26701 + + + + + + 26760 

GCCGCTCGCAGGCGGCGGCAATGCTCTCCGACGCGGCAAAGGGGCCGCCCAAGGAGCAGT 
20 ASVRRRYERLRRFPGGFLVM- 

TGGGCGACGGCGTGTGCAGCTTCAACCCCGTCTACGGCCAGGGCATGACGGTCGCCGCCC 

26761 + + + + + + 26820 

ACCCGCTGCCGCACACGTCGAAGTTGGGGCAGATGCCGGTCCCGTACTGCCAGCGGCGGG 
20 GDGVC S FNPVYGQGMTVAAL- 

TGGAGGCCGTGGCGCTGCGGGACCACTTGCGCGACGCCCCGGACCCCGACGCCCTGCGCT 

26821 + + + + + + 26880 

ACCTCCGGCACCGCGACGCCCTGGTGAACGCGCTGCGGGGCCTGGGGCTGCGGGACGCGA 
20 EAVALRDHLRDAPDPDALRF- 

TCTTCCGGCGTATCTCCACGGTCATCGACGTTCCGTGGGACATCGCCGCCGGAGCGGATC 

26881 + + + + + + 26940 

AGAAGGCCGCATAGAGGTGCCAGTAGCTGCAAGGCACCCTGTAGCGGCGGCCTCGCCTAG 
20 FRRI STVIDVPWDIAAGADL- 

TGAACTTCCCCGGGGTGGAGGGCCCCCGCACCATGAAGGTGAAGATGGCCAACGCCTACA 

26941 + + + + + + 27000 

ACTTGAAGGGGCCCCACCTCCCGGGGGCGTGGTACTTCCACTTCTACCGGTTGCGGATGT 
20 NF PGVEG PRTMKVKMANAYM- 

TGGCCCGCCTGCACGCAGCGGCAGCCGTCGACGGCGCGGTGACCGGGGCGTTCTTCCGGG 

27001 + + + + + + 27060 

ACCGGGCGGACGTGCGTCGCCGTCGGCAGCTGCCGCGCCACTGGCCCCGCAAGAAGGCCC 
20 ARLHAAAAVDGAVTGAF FRV- 

TGGCCGGGCTGGTGGACCCCCCGCAGGCCCTGATGCGCCCCTCCCTCGCCCTGCGGGTCA 

27061 + + + - - + + + 27120 

ACCGGCCCGACCACCTGGGGGGCGTCCGGGACTACGCGGGGAGGGAGCGGGACGCCCAGT 
20 AGLVD P PQALMR P S LAL RVM- 

TGCGCAACTCCTCGGCGAAGCCGTCGGTCCCTTCGGGCGCCGCCGTATGACCGCGCGGCC 
27121 + + + + + + 27180 
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ACGCGTTGAGGAGCCGCTTCGGCAGCCAGGGAAGCCCGCGGCGGCATACTGGCGCGCCGG 
20-* RNS SAKPSVPSGAAV*- 

CGTCCGGGGCGGCTGCCGGGGCCAGGAGCCGACATGCGGGTGATGATCACGGTGTTCCCG 

27181 + + + + + + 27240 

GCAGGCCCCGCCGACGGCCCCGGTCCTCGGCTGTACGCCCACTACTAGTGCCACAAGGGC 
19 MRVMITVFP 

GCGCGGGCGCACTTCCTGCCGCTGGTGCCCTATGCCTGGGCCCTGCAGAGCGCGGGCCAC 

27241 + + + + + + 27300 

CGCGCCCGCGTGAAGGACGGCGACCACGGGATACGGACCCGGGACGTCTCGCGCCCGGTG 
19 ARAHFLPLVPYAWALQSAGH 

GAGGTATGTGTCGTGGCGCCCCCGGGCTATCCCACCGGGGTGGCCGACCCCGACTTCCAC 

27301 + + + + + + 27360 

CTCCATACACAGCACCGCGGGGGCCCGATAGGGTGGCCCCACCGGCTGGGGCTGAAGGTG 
19 EVCVVAP PGYPTGVADPDFH 

GAGGCCGTCACCGCGGCCGGCCTGAAGTCGGTGACCTGCGGGCAGCCGCAGCCGCTGGCG 

27361 + + + + + + 27420 

CTCCGGCAGTGGCGCCGGCCGGACTTCAGCCACTGGACGCCCGTCGGCGTCGGCGACCGC 
19 EAVTAAGLKSVTCGQPQPLA 

GTCCACGACCGCGACGACCCCGGCTACGCGGCGATGCTGCCGACCGCGGCGGAGTCGGAG 

27421 + + + + + + 27480 

CAGGTGCTGGCGCTGCTGGGGCCGATGCGCCGCTACGACGGCTGGCGCCGCCTCAGCCTC 
19 VHDRDDPGYAAMLPTAAESE 

CGCTACGTGGCGGCCCTCGGGATCAGCGAGAAGGAGCGCCCCACCTGGGACGTCTTCTAC 

27481 + + + + + + 27540 

GCGATGCACCGCCGGGAGCCCTAGTCGCTCTTCCTCGCGGGGTGGACCCTGCAGAAGATG 
19 RYVAALGISEKERPTWDVFY 

CACTTCACCTTGCTGGCGATCCGCGACTACCATCCGCCGCGGCCGCGGCAGGACGTGGAC 

27541 + + + + + + 27600 

GTGAAGTGGAACGACCGCTAGGCGCTGATGGTAGGCGGCGCCGGCGCCGTCCTGCACCTG 
19 HFTLLAIRDYHPPRPRQDVD 

CAGGTGATCGAGTTCGCCCGGATCTGGCAGCCCGATCTGGTGCTGTGGGACGCCTGGTTC 

27601 + + + + + + 27660 

GTCCACTAGCTCAAGCGGGCCTAGACCGTCGGGCTAGACCACGACACCCTGCGGACCAAG 
19 QVIEFARIWQPDLVLWDAWF 

CCCTCGGGCGCGATCGCGGCGCGGGTCAGCGGCGCCGCGCACGCGCGGGTGCTCGTAGCC 

27661 + + + + + + 27720 

GGGAGCCCGCGCTAGCGCCGCGCCCAGTCGCCGCGGCGCGTGCGCGCCCACGAGCATCGG 
19 PSGAIAARVSGAAHARVLVA 

CCCGACTACACCGGCTGGGTCACCGAGCGGTTCGCCGCCGCGGGCCCCGCGGCGGGGGCC 

27721 + + + + + + 27780 

GGGCTGATGTGGCCGACCCAGTGGCTCGCCAAGCGGCGGCGCCCGGGGCGCCGCCCCCGG 
19 PDYTGWVTERFAAAGPAAGA 

GACCTCCTGGCCGAGACGATGCGGCCGCTGGCCGAGCGGTACGGCGTGGAGGTCGACGAC 

27781 + + + + + + 27840 

CTGGAGGACCGGCTCTGCTACGCCGGCGACCGGCTCGCCATGCCGCACCTCCAGCTGCTG 
19 DLLAETMRPLAERYGVEVDD 

GATCTTCTGCTCGGACAGTGGACGGTCAATCCGTTCCCGGCGCCGATGAACCCGCCGACC 

27841 + + + + + + 27900 

CTAGAAGACGAGCCTGTCACCTGCCAGTTAGGCAAGGGCCGCGGCTACTTGGGCGGCTGG 
19 DLLLGQWTVNPFPAPMNPPT 

CGGCTCACGAACGTTCCGGTGCGCTACGTGCCCTACACCGGTGCCAGCGTCATGCCCGCG 

27901 + + + + + + 27960 

GCCGAGTGCTTGCAAGGCCACGCGATGCACGGGATGTGGCCACGGTCGCAGTACGGGCGC 
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RLTNVPVRYVPYTGASVMPA 



TGGCTGTACGCGCGGCCGTCGCGGCCGCGGGTGGCGCTGTCGCTCGGAGTGTCCGCGCGG 

2796I + + + + + + 28020 

ACCGACATGCGCGCCGGCAGCGCCGGCGCCCACCGCGACAGCGAGCCTCACAGGCGCGCC 
WLYARPSRPRVALSLGVSAR 

GCGTTCCTCAAGGGTGACTGGGGGCGTACCGCCAAACTGCTGGAAGCGGTCGCGGAGCTG 

28021 + + + + + + 28080 

CGCAAGGAGTTCCCACTGACCCCCGCATGGCGGTTTGACGACCTTCGCCAGCGCCTCGAC 
AFLKGDWGRTAKLLEAVAEL 

GACATCGAGGTGATCGCCACGCTCAACGACAACCAACTGGCGGAGAGCGGGCCGCTGCCG 

28081 + + + + + + 28140 

CTGTAGCTCCACTAGCGGTGCGAGTTGCTGTTGGTTGACCGCCTCTCGCCCGGCGACGGC 
DIEVIATLNDNQLAESGPLP 

GACAACGTCCACACCCTCGACTACGTACCGCTCGACCAGTTGCTGCCCACCTGCTCGGCC 

28141 + + + + + + 28200 

CTGTTGCAGGTGTGGGAGCTGATGCATGGCGAGCTGGTCAACGACGGGTGGACGAGCCGG 
DNVHTLDYVPLDQLLPTCSA 

GTCATCCACCACGGATCGACGGGCACCTTCGCCGCGGCGAGCGCGGCCGGGCTGCCCCAG 

28201 + + + + + + 28260 

CAGTAGGTGGTGCCTAGCTGCCCGTGGAAGCGGCGCCGCTCGCGCCGGCCGGACGGGGTC 
VIHHGSTGTFAAASAAGLPQ 

GTGGTCTGCGACACCGACGAGCCCCTCCTGCTCTTCGGCGAGGACACCCCCGACGGCATC 

28 261 + + + + + + 28320 

CACCAGACGCTGTGGCTGCTCGGGGAGGACGAGAAGCCGCTCCTGTGGGGGCTGCCGTAG 
VVCDTDEPLLLFGEDTPDGI 

GCGTGGGACTTCACCTGCCAGAAGCAGCTCACCGCGACGCTCACCTCCCGCGTGGTCACC 

28321 + + + + + + 28380 

CGCACCCTGAAGTGGACGGTCTTCGTCGAGTGGCGCTGCGAGTGGAGGGCGCACCAGTGG 
AWDFTCQKQLTATLTSRVVT 

GACTACGGGGCGGGGGTGCGCGTCGACCACCAGAAGCAGTCCGCCGGACAGATCCGTGAG 

28381 + + + + + + 28440 

CTGATGCCCCGCCCCCACGCGCAGCTGGTGGTCTTCGTCAGGCGGCCTGTCTAGGCACTC 
DYGAGVRVDHQKQSAGQ I RE 

CAACTACGCAGGGTGCTCACCGAACCTTCCTTCCGCGAGGGCGCTCGACGGATCCGGGAA 

28441 + + + + + + 28500 

GTTGATGCGTCCCACGAGTGGCTTGGAAGGAAGGCGCTCCCGCGAGCTGCCTAGGCCCTT 
QLRRVLTEPSFREGARRIRE 

GACCGGAATTCCGCCCCCAGCCCGGTCGAACTCGTATCGCTCCTGGTAGAACTGACGAAG 

28501 + + + + + + 28560 

CTGGCCTTAAGGCGGGGGTCGGGCCAGCTTGAGCATAGCGAGGACCATCTTGACTGCTTC 
DRNSAPSPVELVSLLVELTK 

CGTCATCGCCGTGACAAGGAGGCGGACCGATGAGGATGCTGGTGACGGGCGGAGCGGGTT 

28561 + + + + + + 28620 

GCAGTAGCGGCACTGTTCCTCCGCCTGGCTACTCCTACGACCACTGCCCGCCTCGCCCAA 

-* RHRRDKEADR*- 

> MRMLVTGGAGF- 

TCATCGGCTCGCAGTTCGTGCGGGCCACACTGCACGGCGAGCTGCCGGGTTCCGAGGACG 

2 8 621 + + + + + + 28680 

AGTAGCCGAGCGTCAAGCACGCCCGGTGTGACGTGCCGCTCGACGGCCCAAGGCTCCTGC 
IGSQFVRATLHGELPGSEDA- 

CCCGGGTGACGGTCCTGGACAAGCTGACGTACTCCGGCAATCCGGCCAACCTCACCTCCG 

28681 + + + + + + 28740 

GGGCCCACTGCCAGGACCTGTTCGACTGCATGAGGCCGTTAGGCCGGTTGGAGTGGAGGC 
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1 RVTVLDKLTYSGNPANLTSV- 

TCGCGGCCCATCCGCGGTACACCTTCGTCCAGGGCGACACCGTCGACCCGCGCGTCGTCG 

28741 + + + + + + 28800 

AGCGCCGGGTAGGCGCCATGTGGAAGCAGGTCCCGCTGTGGCAGCTGGGCGCGCAGCAGC 
1 AAH PRYTFVQGDTVD P RVVD- 

ACGAGGTGGTCGCCGGCCACGACGTCATCGTCCACTTCGCGGCGGAGTCGCACGTGGACC 

28801 + + + + + + 28860 

TGCTCCACCAGCGGCCGGTGCTGCAGTAGCAGGTGAAGCGCCGCCTCAGCGTGCACCTGG 
1 EVVAGHDVIVHFAAES HVDR- 

GCTCGATCGACACCGCCACCCGGTTCGTCACGACCAACGTGCTCGGGACCCAGACGCTGC 

28861 + + + + + + 28920 

CGAGCTAGCTGTGGCGGTGGGCCAAGCAGTGCTGGTTGCACGAGCCCTGGGTCTGCGACG 
1 SIDTATRFVTTNVLGTQTLL- 

TGGAAGCGGCTCTCCGGCACGGGGTCGGCCGGTTCGTGCACGTGTCGACCGACGAGGTCT 

28921 + + + + + + 28980 

ACCTTCGCCGAGAGGCCGTGCCCCAGCCGGCCAAGCACGTGCACAGCTGGCTGCTCCAGA 
1 EAALRHGVGRFVHVS TDEVY- 

ACGGGTCGATCGCCTCCGGCTCATGGACCGAGGACACCCCGCTCGCCCCCAACGTCCCCT 

28981 + + + + + + 29040 

TGCCCAGCTAGCGGAGGCCGAGTACCTGGCTCCTGTGGGGCGAGCGGGGGTTGCAGGGGA 
1 GSIASGSWTEDTPLAPNVPY- 

ACGCGGCGTCGAAGGCGGGTTCGGACCTGATGGCGCTCGCCTGGCACCGCACCCGGGGCC 

29041 + + + + + + 29100 

TGCGCCGCAGCTTCCGCCCAAGCCTGGACTACCGCGAGCGGACCGTGGCGTGGGCCCCGG 
1 AASKAGSDLMALAWHRTRGL- 

TGGACGTCGTCGTCACCCGGTGCACCAACAACTACGGTCCCTACCAGTACCCCGAGAAGG 

29101 + + + + + + 29160 

ACCTGCAGCAGCAGTGGGCCACGTGGTTGTTGATGCCAGGGATGGTCATGGGGCTCTTCC 
1 DVVVTRCTNNYGPYQYPEKV- 

TGATCCCGCTCTTCGTCACCAACATCCTCGACGGCTTGCGGGTGCCCCTGTACGGGGACG 

29161 + + + + + + 29220 

ACTAGGGCGAGAAGCAGTGGTTGTAGGAGCTGCCGAACGCCCACGGGGACATGCCCCTGC 
1 IPLFVTNILDGLRVPLYGDG- 

GCGCCCACCGCCGGGACTGGCTGCACGTGTCCGACCACTGCCGGGCCATCCAGATGGTCA 

29221 + + + + + + 29280 

CGCGGGTGGCGGCCCTGACCGACGTGCACAGGCTGGTGACGGCCCGGTAGGTCTACCAGT 
1 AHRRDWLHVSDHCRAI QMVM- 

TGAACTCCGGCCGGGCCGGGGAGGTCTACCACATCGGCGGCGGCACCGAACTCTCCAACG 

29281 + + + + + + 29340 

ACTTGAGGCCGGCCCGGCCCCTCCAGATGGTGTAGCCGCCGCCGTGGCTTGAGAGGTTGC 
1 NSGRAGEVYHIGGGTELSNE- 

AGGAACTCACCGGCCTGTTGCTCACGGCGTGCGGCACCGACTGGTCCTGCGTGGACCGGG 

29341 + + + + + + 29400 

TCCTTGAGTGGCCGGACAACGAGTGCCGCACGCCGTGGCTGACCAGGACGCACCTGGCCC 
1 ELTGLLLTACGTDWS CVDRV- 

TGGCCGACCGGCAGGGGCACGACCGCCGCTACTCGCTCGACATCACGAAGATCCGGCAGG 

29401 + + + + + + 29460 

ACCGGCTGGCCGTCCCCGTGCTGGCGGCGATGAGCGAGCTGTAGTGCTTCTAGGCCGTCC 
1 ADRQGHDRRYSLDITKI RQE- 

AACTGGGCTACGAGCCCCTGGTCGCCTTCGAGGACGGCCTGGCCGCGACGGTGAAGTGGT 

29461 + + + + + + 29520 

TTGACCCGATGCTCGGGGACCAGCGGAAGCTCCTGCCGGACCGGCGCTGCCACTTCACCA 
1 LGYEPLVAFEDGLAATVKWY- 
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ACCACGAGAACCGTTCGTGGTGGCAGCCGCTGAAGGAAGCGGCCGGCCTCCTGGACGCCG 

29521 + + + + + + 29580 

TGGTGCTCTTGGCAAGCACCACCGTCGGCGACTTCCTTCGCCGGCCGGAGGACCTGCGGC 

1 HENRSWWQPLKEAAGLLDAV- 

TCGGCTGACGGCAGCCACCGCTAGGAACACCCCAGGAAAGGAGCCACCTCCGTGACAGCA 

29581 + + + + + + 29640 

AGCCGACTGCCGTCGGTGGCGATCCTTGTGGGGTCCTTTCCTCGGTGGAGGCACTGTCGT 
2-> M T A 

1-* G * - 

GTCAAGGAGCCGACGTCCCGCGCAGGACGGCGGGAGTGGATCGCTCTCGTCGTCCTCTCC 

29641 + + + + + + 29700 

CAGTTCCTCGGCTGCAGGGCGCGTCCTGCCGCCCTCACCTAGCGAGAGCAGCAGGAGAGG 

2 VKEPTSRAGRREWIALVVLS 

TTGCCCACGATGCTGTTGATGCTGGACATCAACGTCCTCATGCTGGCCTTGCCGCAGTTG 

29701 + + + + + + 29760 

AACGGGTGCTACGACAACTACGACCTGTAGTTGCAGGAGTACGACCGGAACGGCGTCAAC 
2 LPTMLLMLDINVLMLALPQL 

AGCGAGGATCTCGGCGCGAGCAGCACGCAACAGCTGTGGATCACCGACATCTACGGATTC 

29761 + + + + + + 29820 

TCGCTCCTAGAGCCGCGCTCGTCGTGCGTTGTCGACACCTAGTGGCTGTAGATGCCTAAG 
2 SEDLGASSTQQLWITDIYGF 

GCGATCGCCGGCTTCCTGGTGACCATGGGCACCCTCGGCGACCGGATCGGCCGCCGCAGG 

29821 + + + + + + 29880 

CGCTAGCGGCCGAAGGACCACTGGTACCCGTGGGAGCCGCTGGCCTAGCCGGCGGCGTCC 
2 AIAGFLVTMGTLGDRIGRRR 

CTCCTGCTCGGGGGCGCGGCCGTCTTCGCGGTCGTGTCCGTCGTCGCCGCGTTCTCCGAC 

29881 + + + + + + 29940 

GAGGACGAGCCCCCGCGCCGGCAGAAGCGCCAGCACAGGCAGCAGCGGCGCAAGAGGCTG 

2 LLLGGAAVFAVVSVVAAFSD 

AGCGCGGCGATGCTCGTCGTCAGCCGCGCCGTGCTCGGCGTCGCCGGGGCCACGGTGATG 

29941 + + + + + + 30000 

TCGCGCCGCTACGAGCAGCAGTCGGCGCGGCACGAGCCGCAGCGGCCCCGGTGCCACTAC 
2 SAAMLVVSRAVLGVAGATVM 

CCCTCGACGCTCGCGCTCATCAGCAACATGTTCGAGGACCCCAAGGAGCGGGGCACCGCC 

30001 + + + + + + 30060 

GGGAGCTGCGAGCGCGAGTAGTCGTTGTACAAGCTCCTGGGGTTCCTCGCCCCGTGGCGG 
2 PSTLALI SNMFEDPKERGTA 

ATCGCCATGTGGGCGAGCGCCATGATGGCCGGAGTCGCCCTCGGGCCCGCCGTCGGCGGC 

30061 + + + + + + 30120 

TAGCGGTACACCCGCTCGCGGTACTACCGGCCTCAGCGGGAGCCCGGGCGGCAGCCGCCG 
2 IAMWASAMMAGVALGPAVGG 

CTGGTCCTCGCCGCGTTCTGGTGGGGATCGGTGTTCCTCATCGCCGTTCCGGTGATGCTG 

30121 + + + + + + 30180 

GACCAGGAGCGGCGCAAGACCACCCCTAGCCACAAGGAGTAGCGGCAAGGCCACTACGAC 

2 LVLAAFWWGSVFL IAVPVML 

CTGGTGGTGGTCACCGGCCCCGTGCTGCTCACCGAGTCCCGCGACCCGGACGCCGGACGG 

30181 + + + + + + 30240 

GACCACCACCAGTGGCCGGGGCACGACGAGTGGCTCAGGGCGCTGGGCCTGCGGCCTGCC 
2 LVVVTGPVLLTESRDPDAGR 

CTGGACCTGCTGAGCGCGGGGCTCTCCCTCGCGACCGTGCTGCCGGTGATCTACGGACTG 

30241 + + + + + + 30300 

GACCTGGACGACTCGCGCCCCGAGAGGGAGCGCTGGCACGACGGCCACTAGATGCCTGAC 
2 LDLLSAGLSLATVLPVIYGL 
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AAGGAGCTGGCCCGGACCGGGTGGGACCCGCTCGCCGCCGGCGCGGTGGTCCTCGGCGTG 

30301 + + + + + + 30360 

TTCCTCGACCGGGCCTGGCCCACCCTGGGCGAGCGGCGGCCGCGCCACCAGGAGCCGCAC 
KELARTGWDPLAAGAVVLGV 

ATCTTCGGCGCGCTGTTCGTCCAGCGCCAGCGGCGGTTGGCCGACCCCATGCTGGACCTC 

30361 + + + + + + 30420 

TAGAAGCCGCGCGACAAGCAGGTCGCGGTCGCCGCCAACCGGCTGGGGTACGACCTGGAG 
I FGALFVQRQRRLAD PMLDL 

GGCCTCTTCGCCGACCGCACCCTGCGGGCGGGTCTGACGGTCAGTCTGGTCAACGCCGTC 

30421 + + + + + + 30480 

CCGGAGAAGCGGCTGGCGTGGGACGCCCGCCCAGACTGCCAGTCAGACCAGTTGCGGCAG 
GLFADRTLRAGLTVSLVNAV 

ATCATGGGCGGGACCGGACTGATGGTCGCCCTGTACCTCCAGACGATCGCCGGTCACTCC 

30481 + + + + + + 30540 

TAGTACCCGCCCTGGCCTGACTACCAGCGGGACATGGAGGTCTGCTAGCGGCCAGTGAGG 
IMGGTGLMVALYLQTIAGHS 

CCGTTGGCCGCCGGGCTGTGGCTGCTGATCCCGGCCTGCATGCTCGTCGTGGGCGTACAG 

30541 + + + + + + 30600 

GGCAACCGGCGGCCCGACACCGACGACTAGGGCCGGACGTACGAGCAGCACCCGCATGTC 
PLAAGLWLLI PACMLVVGVQ 

CTGTCGAACCTGCTGGCCCAGCGGATGCCCCCTTCCCGGGTGCTGCTGGGGGGACTGCTG 

30601 + + + + + + 30660 

GACAGCTTGGACGACCGGGTCGCCTACGGGGGAAGGGCCCACGACGACCCCCCTGACGAC 
LSNLLAQRMPPSRVLLGGLL 

ATCGCGGCCGTCGGACAGCTCCTGATCACCCAGGTGGACACCGAGGACACCGCCCTCCTC 

30661 + + + + + + 30720 

TAGCGCCGGCAGCCTGTCGAGGACTAGTGGGTCCACCTGTGGCTCCTGTGGCGGGAGGAG 
IAAVGQLLITQVDTEDTALL 

ATCGCGGCCACCACCCTGATCTACTTCGGCGCCTCACCGGTGGGGCCGATCACCACGGGC 

30721 + + + + + + 30780 

TAGCGCCGGTGGTGGGACTAGATGAAGCCGCGGAGTGGCCACCCCGGCTAGTGGTGCCCG 
IAATTLIYFGASPVGPITTG 

GCGATCATGGGAGCCGCGCCCCCGGAGAAGGCGGGTGCCGCCTCGTCGCTGTCCGCCACC 

30781 + + + + + + 30840 

CGCTAGTACCCTCGGCGCGGGGGCCTCTTCCGCCCACGGCGGAGCAGCGACAGGCGGTGG 
AIMGAAPPEKAGAASSLSAT 

GGCGGCGAGTTCGGAGTGGCGCTCGGCATCGCGGGCCTGGGGAGTCTGGGCACCGTCGTG 

30841 + + + + + + 30900 

CCGCCGCTCAAGCCTCACCGCGAGCCGTAGCGCCCGGACCCCTCAGACCCGTGGCAGCAC 
GGEFGVALGIAGLGSLGTVV 

TACAGCGCCGGGGTCGAGGTGCCGGACGCGGCCGGGCCCGCCGACGCCGACGCCGCGCAG 

30901 + + + + + + 30960 

ATGTCGCGGCCCCAGCTCCACGGCCTGCGCCGGCCCGGGCGGCTGCGGCTGCGGCGCGTC 
YSAGVEVPDAAGPADADAAQ 

GAGAGCATCGCCGGCGCCCTGCACACGGCCGGTCAGCTGGCACCGGGCAGCGCCGACGCC 

30961 + + + + + + 31020 

CTCTCGTAGCGGCCGCGGGACGTGTGCCGGCCAGTCGACCGTGGCCCGTCGCGGCTGCGG 
ESIAGALHTAGQLAPGSADA 

CTGCTGGACTCCGCGCGCGCGGCCTTCACCAGCGGCGTGCAGTCCGTCGCCGCCGTCTGC 

31021 + + + + + + 31080 

GACGACCTGAGGCGCGCGCGCCGGAAGTGGTCGCCGCACGTCAGGCAGCGGCGGCAGACG 
LLDSARAAFTSGVQSVAAVC 
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GCCGTGTTCTCCCTGGCGCTCGCCGTCCTCATCGGCACCCGGCTGCGGGACATTTCCGCG 

31081 + + + + + + 31140 

CGGCACAAGAGGGACCGCGAGCGGCAGGAGTAGCCGTGGGCCGACGCCCTGTAAAGGCGC 

2 AVFSLALAVLIGTRLRD I SA 

ATGGACCACGGGCACGGCGAGGAACCGGCCGAGAACGACGCTCAACCGGCCACATGAGCG 

31141 + + + + + + 31200 

TACCTGGTGCCCGTGCCGCTCCTTGGCCGGCTCTTGCTGCGAGTTGGCCGGTGTACTCGC 

2- * MDHGHGEEPAENDAQPAT*- 

CACTTCCGGAGATGCAACGGCCGCCGTCGAGGTATGAGGATCACCTTCCGGGGTGCACCT 

31201 + + + + + + 31260 

GTGAAGGCCTCTACGTTGCCGGCGGCAGCTCCATACTCCTAGTGGAAGGCCCCACGTGGA 

GCACGGCAACGGAGGCGTAGTGGAGTACTGGAACAGCACGGCGGAGACCATGC CC CGCCA 

31261 + + + + + + 31320 

CGTGCCGTTGCCTCCGCATCACCTCATGACCTTGTCGTGCCGCCTCTGGTACGGGGCGGT 

3- > MEYWNSTAETMPRQ 

GGAACTCGAACAGTGGAAGTGGCGCAGGCTCCAGGCCGCCATGGACCACGCCAGAAGGCT 

31321 + + + + + + 31380 

CCTTGAGCTTGTCACCTTCACCGCGTCCGAGGTCCGGCGGTACCTGGTGCGGTCTTCCGA 

3 ELEQWKWRRLQAAMDHARRL 

TTCGCCCTTCTGGCGGGAACGACTCCCCGAGAACATCACCTCCATGGCGGACTACGCGGC 

31381 + + + + + + 31440 

AAGCGGGAAGACCGCCCTTGCTGAGGGGCTCTTGTAGTGGAGGTACCGCCTGATGCGCCG 
3 S PFWRERLPENI TSMADYAA- 

GCGGGTGCCTCTCCTGCGCAAGGCCGACCTCCTCGCCGCGGAAGCCGCGTCTCCCCCTTA 

31441 + + + + + + 31500 

CGCCCACGGAGAGGACGCGTTCCGGCTGGAGGAGCGGCGCCTTCGGCGCAGAGGGGGAAT 
3 RVPLLRKADLLAAEAASPPY- 

CGGCACCTGGCCCTCGCTGGATCCGGCGCTCGGAGTGCGCCATCACCAGACCAGCGGCAC 

31501 + + + + + + 31560 

GCCGTGGACCGGGAGCGACCTAGGCCGCGAGCCTCACGCGGTAGTGGTCTGGTCGCCGTG 
3 GTWPSLDPALGVRHHQTSGT- 

CAGCGGTAACCCCCCCATCCGGACGTTCGACACCGAACGCGACTGGGCCTGGTGCGTGGA 

31561 + + + + + + 31620 

GTCGCCATTGGGGGGGTAGGCCTGCAAGCTGTGGCTTGCGCTGACCCGGACCACGCACCT 
3 SGNPPIRTFDTERDWAWCVD- 

CACGTTCTGCACGGCGCTCCACAGCATGGGCGTGCGCCCGCACCACAAGGGTCTGGTGGC 

31621 + + + + + + 31680 

GTGCAAGACGTGCCGCGAGGTGTCGTACCCGCACGCGGGCGTGGTGTTCCCAGACCACCG 
3 TFCTALHSMGVRPHHKGLVA- 

GTTCGGCTACGGGCTGTTCGCCGGTTTCTGGGGCATGCACTACGGCCTCGAGCGCATGGG 

31681 + + + + + + 31740 

CAAGC CGATGCC CGACAAGCGGCCAAAGACCCCGTACGTGATGCCGGAGCTCGCGTAC C C 
3 FGYGLFAGFWGMHYGLERMG- 

CGCCACGGTCATCCCGGCCGGCGGCCTCGACTCCCGCTCCCGGGTACGGCTGCTGGTCGA 

31741 + + + + + + 31800 

GCGGTGCCAGTAGGGCCGGCCGCCGGAGCTGAGGGCGAGGGCCCATGCCGACGACCAGCT 
3 ATVI PAGGLDSRSRVRLLVD 

CTACCAGATCGAGGTGCTCGGCCTCACACCGAGCTATGCGATGCGGCTGATCGAGACGGC 

3180 l + + + + + + 31860 

GATGGTCTAGCTCCACGAGCCGGAGTGTGGCTCGATACGCTACGCCGACTAGCTCTGCCG 
3 YQIEVLGLTPSYAMRLIETA- 

CCGCGAGATGGGCATCGACCTCGCCCGCGAGGCTAACGTCCAGATCATCCTGGCCGGGGC 
31861 + + + + + + 31920 
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GGCGCTCTACCCGTAGCTGGAGCGGGCGCTCCGATTGCAGGTCTAGTAGGACCGGCCCCG 
3 REMGIDLAREANVQI ILAGA- 

GGAGCCGCGCTCCGCGTTCACCACCCGCACCATCGAGGAGGCCTTCGGCGCCCGGGTCTT 

31921 + + + + + + 31980 

CCTCGGCGCGAGGCGCAAGTGGTGGGCGTGGTAGCTCCTCCGGAAGCCGCGGGCCCAGAA 
3 EPRSAFTTRTIEEAFGARVF 

CAACGCCGCGGGCACCACTGAGTTCGGGGGGGTGTTCATGTTCGAGTGCACCGCCCGGCG 

31981 + + + + + + 32040 

GTTGCGGCGCCCGTGGTGACTCAAGCCCCCCCACAAGTACAAGCTCACGTGGCGGGCCGC 
3 NAAGTTEFGGVFMFECTARR 

CGAGGCCTGCCACATCATCGAACCCTCGTGCATCGAGGAGGTGCTCGACCCGGTGACGGA 

32041 + + + + + + 32100 

GCTCCGGACGGTGTAGTAGCTTGGGAGCACGTAGCTCCTCCACGAGCTGGGCCACTGCCT 
3 EACHIIEPSCIEEVLDPVTE- 

ACAGCCCGTCGGCTACGGCGAGGAGGGCGTCCGAGTCACCACCGGGCTGAACCGTGAGGG 

32101 + + + + + + 32160 

TGTCGGGCAGCCGATGCCGCTCCTCCCGCAGGCTCAGTGGTGGCCCGACTTGGCACTCCC 
3 QPVGYGEEGVRVTTGLNREG- 

GATGCAGCTCTTCCGGCACTGGACCGAGGACGTCGTGGTCAAGCGGCCCCACACCGAGTG 

32161 + + + + + + 32220 

CTACGTCGAGAAGGCCGTGACCTGGCTCCTGCAGCACCAGTTCGCCGGGGTGTGGCTCAC 
3 MQL'FRHWTEDVVVKRPHTEC 

CGGCTGCGGCCGGACGTGGGACTTCTACGACGGCGGCATCCTTCGGCGCGTGGACGACAT 

32221 + + + + + + 32280 

GCCGACGCCGGCCTGCACCCTGAAGATGCTGCCGCCGTAGGAAGCCGCGCACCTGCTGTA 
3 GCGRTWDFYDGGILRRVDDM- 

GCGCAAGATACGCGGGGTCTCGATCACCCCGGTGATGATCGAGGATGTGCTGCGCGGCTT 

32281 + + + + + + 32340 

CGCGTTCTATGCGCCCCAGAGCTAGTGGGGCCACTACTAGCTCCTACACGACGCGCCGAA 
3 RKIRGVS ITPVMIEDVLRGF- 

CGACGAGGTGAACGAGTTCCACTCGTCCATCCGGACCGTCCGCGGACTCGATACGATCCA 

32341 + + + + + + 32400 

GCTGCTCCACTTGCTCAAGGTGAGCAGGTAGGCCTGGCAGGCGCCTGAGCTATGCTAGGT 
3 DEVNEFHSS IRTVRGLDTIH- 

CGTCAAGGTCGAGGCGGGAGACATCTCGGGTGAGGCGGCCGAGAGCCTGTGCGGCCGCAT 

32401 + + + + + + 32460 

GCAGTTCCAGCTCCGCCCTCTGTAGAGCCCACTCCGGCGGCTCTCGGACACGCCGGCGTA 
3 VKVEAGDISGEAAESLCGRI 

CACCGAGGAGTTCAAGCGTGAGATAGGCATACGGCCCCAGGTGGAGCTGACCCCCGCGGG 

32461 + + + + + + 32520 

GTGGCTCCTCAAGTTCGCACTCTATCCGTATGCCGGGGTCCACCTCGACTGGGGGCGCCC 
3 TEEFKRE IGIRPQVELTPAG- 

CAGCCTCCCCCGATCGAAGTGGAAGGCGGCACGACTTCATGACGAGCGCGAACTCGCCCC 

32521 + + + + + + 32580 

GTCGGAGGGGGCTAGCTTCACCTTCCGCCGTGCTGAAGTACTGCTCGCGCTTGAGCGGGG 
3 SLPRSKWKAARLHDERELAP 

TCAGGCCTGAGCAGGTGGAGCAGCTCCTGGTGAGCTACCGGAGCCTGGGCCTGCTGGAGC 

32581 + + + + + + 32640 

AGTCCGGACTCGTCCACCTCGTCGAGGACCACTCGATGGCCTCGGACCCGGACGACCTCG 
3-* Q A * - 

AGAGCTGCGCGGTCCCGGCCGTGCTCGCCGCGGTCAGGGCCGCCCGTGCGGAACTCCGTA 

32641 + + + + + + 32700 

TCTCGACGCGCCAGGGCCGGCACGAGCGGCGCCAGTCCCGGCGGGCACGCCTTGAGGCAT 
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TCGCCCTGGACGGCCAGGGCGTGGAGTTCGAGTACTACCGGGGGCACGACGACAGCCTCG 

32701 + + + + + + 32760 

AGCGGGACCTGCCGGTCCCGCACCTCAAGCTCATGATGGCCCCCGTGCTGCTGTCGGAGC 

TGGCCTGAACCCACCCCCGGTCCGCCGGGTCAGACGAAAGGGAGACCGGTGCCCCACGGT 

32761 + + + + + + 32820 

ACCGGACTTGGGTGGGGGCCAGGCGGCCCAGTCTGCTTTCCCTCTGGCCACGGGGTGCCA 
> M P H G 

GCAGAGCGCGAAGCGAGCCCGGCCGAGGAGAGCGCCGGCACCCGGCCGCTGACCGGCGAG 

32821 + + + + + + 32880 

CGTCTCGCGCTTCGCTCGGGCCGGCTCCTCTCGCGGCCGTGGGCCGGCGACTGGCCGCTC 
AEREAS PAEESAGTRPLTGE 

GAGTATCTGGAGAGCCTGCGGGACGCGCGGGAGGTGTACCTCGACGGCAGCCGCGTCAAG 

32881 + + + + + + 32940 

CTCATAGACCTCTCGGACGCCCTGCGCGCCCTCCACATGGAGCTGCCGTCGGCGCAGTTC 
EYLESLRDAREVYLDGSRVK 

GACGTCACCGCGCATCCCGCGTTCCACAACCCGGCCCGGATGACGGCCCGGCTGTACGAC 

32941 + + + + + + 33000 

CTGCAGTGGCGCGTAGGGCGCAAGGTGTTGGGCCGGGCCTACTGCCGGGCCGACATGCTG 
DVTAHPAFHNPARMTARLYD 

AGCCTGCACGACCCCGCCCAGAAAGCGGTCCTGACGGCGCCCACCGATGCCGGTGACGGT 

33001 + + + + + + 33060 

TCGGACGTGCTGGGGCGGGTCTTTCGCCAGGACTGCCGCGGGTGGCTACGGCCACTGCCA 
SLHDPAQKAVLTAPTDAGDG 

TTCACCCACCGCTTCTTCACCGCACCGCGCAGCGTCGACGACCTGGTCAAGGACCAGGCC 

33061 + + + + + + 33120 

AAGTGGGTGGCGAAGAAGTGGCGTGGCGCGTCGCAGCTGCTGGACCAGTTCCTGGTCCGG 
FTHRFFTAPRSVDDLVKDQA 

GCCATCGCATCCTGGGCGCGCAAGAGCTACGGCTGGATGGGGCGCAGCCCCGACTACAAG 

33121 + + + + + + 33180 

CGGTAGCGTAGGACCCGCGCGTTCTCGATGCCGACCTACCCCGCGTCGGGGCTGATGTTC 
AIASWARKSYGWMGRS PDYK 

GCGTCGTTCCTCGGCACGCTGGGGGCCAACGCCGACTTCTACGAGCCCTTCGCGGACAAC 

33181 + + + + + + 33240 

CGCAGCAAGGAGCCGTGCGACCCCCGGTTGCGGCTGAAGATGCTCGGGAAGCGCCTGTTG 
AS FLGTLGANADFYE P FADN 

GCCCGGCGCTGGTACCGGGAGTCGCAGGAGAAGGTGCTGTACTGGAACCATGCCTTCCTT 

33241 + + + + + + 33300 

CGGGCCGCGACCATGGCCCTCAGCGTCCTCTTCCACGACATGACCTTGGTACGGAAGGAA 
ARRWYRESQEKVLYWNHAFL 

CACCCGCCGGTCGACCGCTCGCTGCCCGCCGACGAGGTGGGCGACGTCTTCATCCACGTC 

33301 + + + + + + 33360 

GTGGGCGGCCAGCTGGCGAGCGACGGGCGGCTGCTCCACCCGCTGCAGAAGTAGGTGCAG 
HPPVDRSLPADEVGDVFIHV 

GAGCGGGAGACCGACGCGGGCCTGGTGGTGAGCGGGGCCAAGGTCGTCGCGACCGGATCG 

33361 + + + + + + 33420 

CTCGCCCTCTGGCTGCGCCCGGACCACCACTCGCCCCGGTTCCAGCAGCGCTGGCCTAGC 
ERETDAGLVVSGAKVVATGS 

GCCCTCACCCACGCGGCGTTCATCTCGCACTGGGGACTTCCCATCAAGGACCGGAAGTTC 

33421 + + + + + + 33480 

CGGGAGTGGGTGCGCCGCAAGTAGAGCGTGACCCCTGAAGGGTAGTTCCTGGCCTTCAAG 
ALTHAAFI SHWGLP I KDRKF 



GCCCTGGTGGCCACCGTGCCGATGGACGCGGACGGCCTCAAGGTGATCTGCCGTCCCTCC 
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33481 + + + + + + 33540 

CGGGACCACCGGTGGCACGGCTACCTGCGCCTGCCGGAGTTCCACTAGACGGCAGGGAGG 
ALVATVPMDADGLKVI CRPS 

TACTCCGCAAACGCGGCGACCACGGGCAGCCCGTTCGACAACCCGCTGTCCTCACGGCTG 

33541 + + + + + + 33600 

ATGAGGCGTTTGCGCCGCTGGTGCCCGTCGGGCAAGCTGTTGGGCGACAGGAGTGCCGAC 
YSANAATTGS PFDNPLSSRL 

GACGAGAACGACGCCATCCTCGTACTCGACCAGGTGCTGATCCCCTGGGAGAACGTGTTC 

33601 + + + + + + 33660 

CTGCTCTTGCTGCGGTAGGAGCATGAGCTGGTCCACGACTAGGGGACCCTCTTGCACAAG 
DENDAILVLDQVLI PWENVF 

GTCTACGGCAACCTGGGCAAGGTACATCTCCTCGCCGGACAGTCCGGGATGATCGAACGC 

33661 + + + + + + 33720 

CAGATGCCGTTGGACCCGTTCCATGTAGAGGAGCGGCCTGTCAGGCCCTACTAGCTTGCG 
VYGNLGKVHLLAGQSGMIER 

GCCACCTTCCACGGGTGCACCCGGCTCGCCGTGAAGCTGGAGTTCATCGCCGGGCTGCTG 

33721 + + + + + + 33780 

CGGTGGAAGGTGCCCACGTGGGCCGAGCGGCACTTCGACCTCAAGTAGCGGCCCGACGAC 
ATFHGCTRLAVKLEFIAGLL 

GCCAAGGCGCTGGACATCACCGGGGCGAAGGACTTCCGCGGTGTGCAGACCCGGCTCGGA 

33781 + + + + + + 33840 

CGGTTCCGCGACCTGTAGTGGCCCCGCTTCCTGAAGGCGCCACACGTCTGGGCCGAGCCT 
AKALDITGAKDFRGVQTRLG 

GAAGTCCTGGCCTGGCGCAACCTCTTCTGGTCACTGTCGGACGCGGCGGCCCGCAACCCC 

33841 + + + + + + 33900 

CTTCAGGACCGGACCGCGTTGGAGAAGACCAGTGACAGCCTGCGCCGCCGGGCGTTGGGG 
EVLAWRNLFWSLSDAAARNP 

GTCCCCTGGAAGAACGGCACGCTCCTGCCCAACCCTCAGGCGGGTATGGCCTACCGCTGG 

33901 + + + + + + 33960 

CAGGGGACCTTCTTGCCGTGCGAGGACGGGTTGGGAGTCCGCCCATACCGGATGGCGACC 
VPWKNGTLLPNPQAGMAYRW 

TTCATGCAGATCGGCTACCCGCGGGTCCTGGAGATCGTCCAACAGGACGTGGCCAGCGGC 

33961 + + + + + + 34020 

AAGTACGTCTAGCCGATGGGCGCCCAGGACCTCTAGCAGGTTGTCCTGCACCGGTCGCCG 
FMQIGYPRVLEIVQQDVASG 

CTCATGTACGTCAACTCCTCCACGGAGGACTTCCGCAACCCCGAGACCGGCCCCTACTTG 

34021 + + + + + + 34080 

GAGTACATGCAGTTGAGGAGGTGCCTCCTGAAGGCGTTGGGGCTCTGGCCGGGGATGAAC 
LMYVNSSTEDFRNPETGPYL 

GAGAAGTACCTCCGGGGCAGCGACGGCGCAGGCGCCGTCGAGCGTGTCAAGGTGATGAAG 

34081 + + + + + + 34140 

CTCTTCATGGAGGCCCCGTCGCTGCCGCGTCCGCGGCAGCTCGCACAGTTCCACTACTTC 
EKYLRGSDGAGAVERVKVMK 

CTGCTGTGGGACGCGGTGGGATCCGACTTCGGCGGCCGGCACGAACTCTACGAGCGGAAC 

34141 + + + + + + 34200 

GACGACACCCTGCGCCACCCTAGGCTGAAGCCGCCGGCCGTGCTTGAGATGCTCGCCTTG 
LLWDAVGSDFGGRHELYERN 

TACTCCGGGAACCACGAGAACACCCGGATCGAGTTGCTGCTGTCGCAGACGGCGAGCGGC 

34201 + + + + + + 34260 

ATGAGGCCCTTGGTGCTCTTGTGGGCCTAGCTCAACGACGACAGCGTCTGCCGCTCGCCG 
YSGNHENTRIELLLSQTASG 

AAACTGGACTCGTACATGGACTTCGCCCAGGCATGCATGGACGAGTACGACCTGGACGGC 
34261 + + + + + + 34320 
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TTTGACCTGAGCATGTACCTGAAGCGGGTCCGTACGTACCTGCTCATGCTGGACCTGCCG 
4 KLDSYMDFAQACMDEYDLDG 

TGGACCGCTCCCGACCTGGAGTCGTTTCACGCGATGCGTTCCGCCTCCCGCGACCTTCTC 

34321 + + + + + + 34380 

ACCTGGCGAGGGCTGGACCTCAGCAAAGTGCGCTACGCAAGGCGGAGGGCGCTGGAAGAG 
4 WTAPDLESFHAMRSASRDLL 

GGAGGGCTGTAGTTCCCCGACGGTGTACTGCGGCCCCCGATCCGGGGGCCGCAGTACACC 

34381 + + + + + + 34440 

CCTCCCGACATCAAGGGGCTGCCACATGACGCCGGGGGCTAGGCCCCCGGCGTCATGTGG 
4-* G G L * - 

GTCGGGGCGGCTGGTGCTCAGCCGCGCAGGAATCCGATGAGCTCGGGGGCGAGCTTCTTG 

34441 + + + + + + 34500 

CAGCCCCGCCGACCACGAGTCGGCGCGTCCTTAGGCTACTCGAGCCCCCGCTCGAAGAAC 
22-* *GRLFGILEPALKK- 

GGCGCCATGGCGACGGCACCGTGGTTGAGCCCGTTCAGGGTGCGGTGGCTCGCGTCGGGG 

34501 + + + + + + 34560 

CCGCGGTACCGCTGCCGTGGCACCAACTCGGGCAAGTCCCACGCCACCGAGCGCAGCCCC 
22 PAMAVAGHNLGNLTRHSADP- 

AGGACTCCGGTGAGTTCCTTCGCGGCACGCTGGAAACCGTCGGGGCTCTTGGAACCGGTC 

34561 + + + + + + 34620 

TCCTGAGGCCACTCAAGGAAGCGCCGTGCGACCTTTGGCAGCCCCGAGAACCTTGGCCAG 
22 LVGTLEKAARQFGDPSKSGT- 

AGCACCAGGGTCGGGGCCGACGCCGCCGACCACGGCTCGGCGGGGAGCGGCTTGCCCTGC 

34621 + + + + + + 34680 

TCGTGGTCCCAGCCCCGGCTGCGGCGGCTGGTGCCGAGCCGCCCCTCGCCGAACGGGACG 
22 LVLTPASAASWPEAPLPKGQ 

TGGGTGTCGCCCATCACCGCGATGTCGTAGGGAAGCGTGTTGGCCAGACCCTTGAGGTTG 

34681 + + + + + + 34740 

ACCCACAGCGGGTAGTGGCGCTACAGCATCCCTTCGCACAACCGGTCTGGGAACTCCAAC 
22 QTDGMVAIDYPLTNALGKLN- 

GACCAGACACCGGGCATCAGGCGCATGGCGCCGACCATGAAGGAGGGCATGCCCTGTGCC 

34741 + + + + + -t- 34800 

CTGGTCTGTGGCCCGTAGTCCGCGTACCGCGGCTGGTACTTCCTCCCGTACGGGACACGG 
22 SWVGPMLRMAGVMFS PMGQA- 

TTGACCATGAAGGCCTTGACCGCGTCGCTGCGTCGGTCCTCCGCCAGAAGGCTGTCGATC 

34801 + + + + + + 34860 

AACTGGTACTTCCGGAACTGGCGCAGCGACGCAGCCAGGAGGCGGTCTTCCGACAGCTAG 
22 KVMFAKVADSRRDEALLSDI 

TGACCGCCGAAGCCGGCGGGCGGGCCGAAGCCGTCCGAGGTGACGGAGAACGGCGGCTCG 

34861 + + + + + + 34920 

ACTGGCGGCTTCGGCCGCCCGCCCGGCTTCGGCAGGCTCCACTGCCTCTTGCCGCCGAGC 
22 QGGFGAPPGFGDSTVS FPPE 

TAGACCGCGAGCTTGTTCACCTTCAGGCCGGCGGCGGCGGCTCGCAGGGCGAGCACCGCG 

34921 + + + + + + 34980 

ATCTGGCGCTCGAACAAGTGGAAGTCCGGCCGCCGCCGCCGAGCGTCCCGCTCGTGGCGC 
22 YVALKNVKLGAAAARLALVA 

CCGGAAGAGCTGCCGAACAGGGAGGCCGAACCGCCGACCTGGTCGATCAGCGCCGCGATG 

34981 + + + + + + 35040 

GGCCTTCTCGACGGCTTGTCCCTCCGGCTTGGCGGCTGGACCAGCTAGTCGCGGCGCTAC 
22 GSSSGFLSASGGVQDILAAI- 

TCCTCGATCTCGCGCTCGACCGCGTACGCCGGACCGTCGGCGCTGGCGCCGCGGCCCCGA 

35041 + + + + + + 35100 

AGGAGCTAGAGCGCGAGCTGGCGCATGCGGCCTGGCAGCCGCGACCGCGGCGCCGGGGCT 
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22 DE I EREVAYAP, GDASAGRGR 

CGGTCGTAGTTGACGACCGTGAAGTGCTCGGCGAGGAGACCGGCGAGCTTCTTGGCGTCG 

35101 + + + + + + 35160 

GCCAGCATCAACTGCTGGCACTTCACGAGCCGCTCCTCTGGCCGCTCGAAGAACCGCAGC 
22 RDYNVVTFHEALLGALKKAD 

GAGCGGTCGGCCAAGGCGGAGGCCACCAGGATCACCGCCGGCCCCTCGCCCGACTTGTCG 

35161 + + + + + + 35220 

CTCGCCAGCCGGTTCCGCCTCCGGTGGTCCTAGTGGCGGCCGGGGAGCGGGCTGAACAGC 
22 SRDALASAVLIVAPGEGSKD- 

AAGGCGATCGTGGTGCCGTCGGCCGATACCGTCGTTGATTCCACCTTGGCTGCTTTCTCA 

35221 + + + + + + 35280 

TTCCGCTAGCACCACGGCAGCCGGCTATGGCAGCAACTAAGGTGGAACCGACGAAAGAGT 
22 F IATTGDASVTTSEVKAAKE 

CGGGTTGAAGACATAGCTTCCCTCAGATCACATTGTGGGGCGTGCTGCCGACAGTGGAGA 

35281 + + + + + + 35340 

GCCCAACTTCTGTATCGAAGGGAGTCTAGTGTAACACCCCGCACGACGGCTGTCACCTCT 
22-< R T S S M - 

CCGGCGTCCGGAGGAAAAGTAATCGGTCCTGCCAGAATTGGGGGTTCCGGAGGGCACGCC 

35341 + + + + + + 35400 

GGCCGCAGGCCTCCTTTTCATTAGCCAGGACGGTCTTAACCCCCAAGGCCTCCCGTGCGG 

GACCGCTGCACGACGGCGCGCCCCGACCTTCCGGACATTGTCGTGCCCTCAGATGTGTTT 

35401 + + + + + + 35460 

CTGGCGACGTGCTGCCGCGCGGGGCTGGAAGGCCTGTAACAGCACGGGAGTCTACACAAA 

CGCATCTTCAGGAGTGCTCAGTGATCCGTGAGGTGAGAAAGGGACGGTGGTCCGGTCAGT 

35461 + + + + + + 35520 

GCGTAGAAGTCCTCACGAGTCACTAGGCACTCCACTCTTTCCCTGCCACCAGGCCAGTCA 
18-* * 

CGTTGCCGCGCGGGCTGTTCTGGTAAGCGGCCAGACGCCACTGCCCGTCCTGTTCGACGG 

35521 + + + + + + 35580 

GCAACGGCGCGCCCGACAAGACCATTCGCCGGTCTGCGGTGACGGGCAGGACAAGCTGCC 
18 DNGRPSNQYAALRWQGDQEV 

CCAGCCAGGAGGCCCGGACGGCGCCGTCGCCGCTCGCCTCGGTCTCCCCCGGGGCGAGGA 

35581 + + + + + + 35640 

GGTCGGTCCTCCGGGCCTGCCGCGGCAGCGGCGAGCGGAGCCAGAGGGGGCCCCGCTCCT 
18 ALWSARVAGDGSAETEGPAL 

TGCCGCCCTCGGTGATGAGCAGGGCGATGCCGTCGCCGAGCAGGCGCGCGTCGATGGGGC 

35641 + + + + + + 35700 

ACGGCGGGAGCCACTACTCGTCCCGCTACGGCAGCGGCTCGTCCGCGCGCAGCTACCCCG 
18 IGGETILLAIGDGLLRADIP 

TGCCGATGACACGGGTGCCCTTGTACGGGCCCGCGAAGGCGGCCGCCATGTGGGTGCGGA 

35701 + + + + + + 35760 

ACGGCTACTGTGCCCACGGGAACATGCCCGGGCGCTTCCGCCGGCGGTACACCCACGCCT 
18 SGIVRTGKYPGAFAAAMHTR 

TGTTCTCGCGGCCCTTGCGGAAGAGGCCGGGGAGGATCATCGTCCCGTCCTCGGCGAAGA 

35761 + + + + + + 35820 

ACAAGAGCGCCGGGAACGCCTTCTCCGGCCCCTCCTAGTAGCAGGGCAGGAGCCGCTTCT 
18 INERGKRFLGPLIMTGDEAF 

CGTCGGCGAACCGGTCGGCGTCGTGGTCGGCCCAGGCGGCCACGATGCGCGCCGGCAGAG 

35821 + + + + + + 35880 

GCAGCCGCTTGGCCAGCCGCAGCACCAGCCGGGTCCGCCGGTGCTACGCGCGGCCGTCTC 
18 VDAFRDADHDAWAAVIRAPL 

CGGCTACCGCTGCCAGGGCGGCGTCGGGAGCGGAGGTGGTCGAGTCGGTGCTGGTCATAT 
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35881 + + + + + + 35940 

GCCGATGGCGACGGTCCCGCCGCAGCCCTCGCCTCCACCAGCTCAGCCACGACCAGTATA 
-< AAVAALAADPASTTSDTSTM 

CGCGGTTCCCGTCCGTTGGTTGGCGGTTTCGGCACGGCCCGCAGCCCTGCCCGAGCCCGA 

3594I + + + + + + 36000 

GCGCCAAGGGCAGGCAACCAACCGCCAAAGCCGTGCCGGGCGTCGGGACGGGCTCGGGCT 

CGCTGGCAGGCGGCCCCGTCATCAGGCATCTCCTGCGTTGCGCCCCACGCCAGTCACTTC 

36001 + + + + + + 36060 

GCGACCGTCCGCCGGGGCAGTAGTCCGTAGAGGACGCAACGCGGGGTGCGGTCAGTGAAG 

ACGGCCAGAACAAGTCGCGCATTCTGGAAGAAGCTGAGGCCCGCGACCCGGTGCGACGAT 

36 061 + + + + + + 36120 

TGCCGGTCTTGTTCAGCGCGTAAGACCTTCTTCGACTCCGGGCGCTGGGCCACGCTGCTA 

CTGCGGTGTCACGGAGTTCGCACACGTTTACGCACGGAGGCTCGATGCCCGCTGTCAATG 

36121 + + + + + + 36180 

GACGCCACAGTGCCTCAAGCGTGTGCAAATGCGTGCCTCCGAGCTACGGGCGACAGTTAC 
> MPAVNG- 

GATCGGTGCAGTCAGGCCAGTCGCACCGACGCTCCGTCGTGGCGACGGTGGTGGGCAACT 

36181 + + + + + + 36240 

CTAGCCACGTCAGTCCGGTCAGCGTGGCTGCGAGGCAGCACCGCTGCCACCACCCGTTGA 
SVQSGQ SHRRSVVATVVGNF- 

TCGTGGAGTCGTTCGACTGGCTCGCCTACGGGCTCTTCGCTCCTCTCTTCGCGGCTCAGT 

36241 + + + + + + 36300 

AGCACCTCAGCAAGCTGACCGAGCGGATGCCCGAGAAGCGAGGAGAGAAGCGCCGAGTCA 
VESFDWLAYGLFAPLFAAQF- 

TCTTCCCCTCGTCCAACCAGTTCACCTCCCTGCTCGGCGCGTTCGCGGTCTTCGGCACGG 

36301 + + + + + + 36360 

AGAAGGGGAGCAGGTTGGTCAAGTGGAGGGACGAGCCGCGCAAGCGCCAGAAGCCGTGCC 
FPSSNQFTSLLGAFAVFGTG- 

GCATGCTCTTCCGGCCGATCGGCGGGGTCCTGCTGGGCCGCCTCGCCGACCGGCGCGGCC 

36361 + + + + + + 36420 

CGTACGAGAAGGCCGGCTAGCCGCCCCAGGACGACCCGGCGGAGCGGCTGGCCGCGCCGG 
ML FR P I GGVLLGRLADRRGR- 

GGCGCCCCGCCCTGATGCTGGCGATCGGACTGATGACCGGCGGCTCGACCCTGATCGCCG 

36421 + + + + + + 36480 

CCGCGGGGCGGGACTACGACCGCTAGCCTGACTACTGGCCGCCGAGCTGGGACTAGCGGC 
RPALMLAI GLMTGGSTL I AV- 

TCGTCCCCACCTACGAGCACATCGGGATCCTCGCCCCGCTGCTTCTGCTGCTCGCCCGGC 

36481 + + + + + + 36540 

AGCAGGGGTGGATGCTCGTGTAGCCCTAGGAGCGGGGCGACGAAGACGACGAGCGGGCCG 
VPTYEHIGILAPLLLLLARL- 

TCGCCCAGGGAGTCTCCTCGGGCGGGGAATGGACAGCGGCGGCCACCTACCTGATGGAGA 

36541 + + + + + + 36600 

AGCGGGTCCCTCAGAGGAGCCCGCCCCTTACCTGTCGCCGCCGGTGGATGGACTACCTCT 
AQGVS SGGEWTAAATYLME I - 

TCGCGCCGAAGAACCGCCGGTGCCTCTACAGCAGCCTCTTCTCCGTGACGACCATGGCGG 

36601 + + + + + + 36660 

AGCGCGGCTTCTTGGCGGCCACGGAGATGTCGTCGGAGAAGAGGCACTGCTGGTACCGCC 
APKNRRCLYSSLFSVTTMAG- 

GCCCCTTCGTCGCATCGCTGCTGGGCGCGGGCCTCGGCGTGTGGCTGGGAACCGCGACGA 

36661 + + + + + + 36720 

CGGGGAAGCAGCGTAGCGACGACCCGCGCCCGGAGCCGCACACCGACCCTTGGCGCTGCT 
PFVASLLGAGLGVWLGTATM- 
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TGGAGGCCTGGGGCTGGCGGGTGCCGTTCCTCCTCGGCGGCGTCTTCGGCGTGATCCTGC 

36721 + + + + + + 36780 

ACCTCCGGACCCCGACCGCCCACGGCAAGGAGGAGCCGCCGCAGAAGCCGCACTAGGACG 
5 EAWGWRVPFLLGGVFGVILL- 

TGTTCCTGCGCCGTCGGCTCACCGAGACCGAGGTCTTCCGCCGGGAGGTGCGGCCCCGGG 

36781 + + + + + + 36840 

ACAAGGACGCGGCAGCCGAGTGGCTCTGGCTCCAGAAGGCGGCCCTCCACGCCGGGGCCC 
5 FLRRRLTETEVFRREVRPRA- 

CCCGGCGCGGCTCACTGGGCCAGCTGATCGGAGCCCACCGCCCCCAGGTGCTGCTGGCCG 

36841 + + + + + + 36900 

GGGCCGCGCCGAGTGACCCGGTCGACTAGCCTCGGGTGGCGGGGGTCCACGACGACCGGC 
5 RRGSLGQL IGAHRPQVLLAV- 

TGATGCTGGTGGCCGGACTGGGCGTCATCGGCGGAACGTGGTCGACCGCGGTCCCGGCGA 

36901 + + + + + + 36960 

ACTACGACCACCGGCCTGACCCGCAGTAGCCGCCTTGCACCAGCTGGCGCCAGGGCCGCT 
5 MLVAGLGVI GGTW S TAVPAM- 

TGGGCCACCGTCTGATCGGCTCGCAGACGATGTTCTGGGTGGTGGTCTGTGTGACCGGCT 

36961 + + + + + + 37020 

ACC CGGTGGCAGACTAGCCGAGCGTCTGCTACAAGACCCACCACCAGACACACTGGC CGA 
5 GHRL I GSQTMFWVVVCVTGS- 

CGGTCATCCTGCTGCAGGTACCCATAGGGCTGCTCGCCGACCGGGTGGAACCGGGCAGGT 

37021 + + + + + + 37080 

GCCAGTAGGACGACGTCCATGGGTATCCCGACGAGCGGCTGGCCCACCTTGGCCCGTCCA 
5 VI LLQVP IGLLADRVE PGRF- 

TCCTGATCGTCTCCAGCGTCGTCTTCGCCGCTGTGGGCTCGTACGCCTACCTCACCGTCC 

37081 + + + + + + 37140 

AGGACTAGCAGAGGTCGCAGCAGAAGCGGCGACACCCGAGCATGCGGATGGAGTGGCAGG 
5 LIVSSVVFAAVGSYAYLTVQ- 

AGGACTCCTTCGCGAGCCTGGCGTTCACGTACAGCACCGGAGTGATCTTCCTCGGCTGCG 

37141 + + + + + + 37200 

TCCTGAGGAAGCGCTCGGACCGCAAGTGCATGTCGTGGCCTCACTAGAAGGAGCCGACGC 
5 DS FASLAFTYSTGVI FLGCV- 

TCACCATGGTGCTGCCGAAGATGCTCTCCAGAATCTTCCCTCCGCAGATACGCGGCCTGG 

37201 + + + + + + 37260 

AGTGGTACCACGACGGCTTCTACGAGAGGTCTTAGAAGGGAGGCGTCTATGCGCCGGACC 
5 TMVLPKMLSRI FPPQI RGLG- 

GCATCGGGCTGCCGCACGCCTCGACCACCGCACTCCTCGGCGGGGCGGGGCCACTGCTGG 

37261 + + + + + + 37320 

CGTAGCCCGACGGCGTGCGGAGCTGGTGGCGTGAGGAGCCGCCCCGCCCCGGTGACGACC 
5 I GLPHAS TTALLGGAG PLLA- 

CCGCCTACTCCGACGAGCGAGGCGCCTCGGGCTGGTTCATCGCCGCCGTGATGGCCGCGG 

37321 + + + + + + 37380 

GGCGGATGAGGCTGCTCGCTCCGCGGAGCCCGACCAAGTAGCGGCGGCACTACCGGCGCC 
5 AYS DERGASGWF I AAVMAAV- 

TCCTGCTCGCCTGGCCGGCCACCCTGTGGGAGCGACGGCTGTTCCGCGCCCGGACGGCCC 

37381 + -f + + + + 37440 

AGGACGAGCGGACCGGCCGGTGGGACACCCTCGCTGCCGACAAGGCGCGGGCCTGCCGGG 
5 LLAWPATLWERRLFRARTAP- 

CGGGAAGCGAGCCGGTTCCCGAATCCGCCGTCGCCCGCCCCGTCGGGTGACCGTCCGCAC 

37441 + + + + + + 37500 

GCCCTTCGCTCGGCCAAGGGCTTAGGCGGCAGCGGGCGGGGCAGCCCACTGGCAGGCGTG 
5-* GSEPVPESAVARPVG*- 
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TTCTGCATCCCGTCCGGCACCGAGCGCCGGCGACCTTCCCGACTGAGAGGTTGACATCAT 

37501 + + + + + + 37560 

AAGACGTAGGGCAGGCCGTGGCTCGCGGCCGCTGGAAGGGCTGACTCTCCAACTGTAGTA 
23-> M - 

GACGACGTCCGACACCACCGACCGGTCCCAGGACGGCGTGCCGCCGCTCTCCTTCCACCA 

37561 + + + + + + 37620 

CTGCTGCAGGCTGTGGTGGCTGGCCAGGGTCCTGCCGCACGGCGGCGAGAGGAAGGTGGT 
23 TTSDTTDRSQDGVPPLSFHQ- 

GGAGTTCCTGTGCATGTTCGACAGCGGGAACGACGGCGCCGACGTGGGGCCGTTCGGCCC 

37621 + + + + + + 37680 

CCTCAAGGACACGTACAAGCTGTCGCCCTTGCTGCCGCGGCTGCACCCCGGCAAGCCGGG 
23 EFLCMFDSGNDGADVGPFGP- 

CATGTACCACATCGTCGGAGCCTGGCGGCTGACCGGCGGGATCGACGAGGAGACCCTGCG 

37681 + + + + + + 37740 

GTACATGGTGTAGCAGCCTCGGACCGCCGACTGGCCGCCCTAGCTGCTCCTCTGGGACGC 
23 MYHIVGAWRLTGGIDEETLR 

CGAGGCGCTGGGTGACGTCGTCGTGCGCCACGAGGCCCTGCGCACATCGCTGGTCCGCGA 

37741 + + + + + + 37800 

GCTCCGCGACCCACTGCAGCAGCACGCGGTGCTCCGGGACGCGTGTAGCGACCAGGCGCT 
23 EALGDVVVRHEALRTSLVRE- 

AGGTGGCACGCACCGGCCGGAGATCCTGCCTGCGGGGCCCGCCGCGCTGGAGGTCCGTGA 

37801 + + + + + + 37860 

TCCACCGTGCGTGGCCGGCCTCTAGGACGGACGCCCCGGGCGGCGCGACCTCCAGGCACT 
23 GGTHRPEI LPAGPAALEVRD 

TCTCGGCGACGTCGACGAGTCGGAGCGGGTGCGGCGCGGTGAGGAACTGCTCAACGAGGT 

37861 + + + + + + 37920 

AGAGCCGCTGCAGCTGCTCAGCCTCGCCCACGCCGCGCCACTCCTTGACGAGTTGCTCCA 
23 LGDVDESERVRRGEELLNEV- 

GGAGTCGACCGGTCTGAGCGTGCGGGAGCTGCCCCTGCTGCGGGCCGTGCTCGGACGCTT 

37921 + + + + + + 37980 

CCTCAGCTGGCCAGACTCGCACGCCCTCGACGGGGACGACGCCCGGCACGAGCCTGCGAA 
23 ESTGLSVRELPLLRAVLGRF 

CGACCAGAAGGACGCGGTGCTGGTCCTCATCGCCCACCACACCGCCGCGGACGCCTGGGC 

37981 + + + + + + 38040 

GCTGGTCTTCCTGCGCCACGACCAGGAGTAGCGGGTGGTGTGGCGGCGCCTGCGGACCCG 
23 DQKDAVLVL IAHHTAADAWA 

CATGCACGTCATCGCCCGCGACCTGCTCAACCTGTACGCCGCCAGGCGCGGGAACCCGGT 

38041 + + + + + + 38100 

GTACGTGCAGTAGCGGGCGCTGGACGAGTTGGACATGCGGCGGTCCGCGCCCTTGGGCCA 
23 MHVIARDLLNLYAARRGNPV- 

TCCCCCGCTCCCCGAGCCGGCCCAGCATGCCGAGTTCGCCCGCTGGGAGCGCGAGGCGGC 

38101 + + + + + + 38160 

AGGGGGCGAGGGGCTCGGCCGGGTCGTACGGCTCAAGCGGGCGACCCTCGCGCTCCGCCG 
23 PPLPEPAQHAEFARWEREAA- 

CGAGGCACCGCGGGTCGCGGTCTCGAAGGAATTCTGGCGCAAGCGCCTCCAGGGCGCGCG 

38161 + + + + + + 38220 

GCTCCGTGGCGCCCAGCGCCAGAGCTTCCTTAAGACCGCGTTCGCGGAGGTCCCGCGCGC 
23 EAPRVAVS KEFWRKRLQGAR 

GATCATCGGGCTGGAGACGGACATACCGCGCTCGGCGGGGCTGCCCAAGGGCACCGCGTG 

38221 + + + + + + 38280 

CTAGTAGCCCGACCTCTGCCTGTATGGCGCGAGCCGCCCCGACGGGTTCCCGTGGCGCAC 
23 I IGLETD I PRSAGLPKGTAW 

GCAGCGCTTCGCCGTACGCGGGGAACTGGCCGACGCCGTGGTGGAGTTCTCACGGGCCGC 
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38281 + + + + + + 38340 

CGTCGCGAAGCGGCATGCGCCCCTTGACCGGCTGCGGCACCACCTCAAGAGTGCCCGGCG 
23 QRFAVRGELADAVVEFSRAA- 

CAAGTGCTCCCCGTTCATGACCATGTTCGCCGCCTACCAGGTGCTGCTGCACCGCAGGAC 

38341 + + + + + + 38400 

GTTCACGAGGGGCAAGTACTGGTACAAGCGGCGGATGGTCCACGACGACGTGGCGTCCTG 
23 KCS PFMTMFAAYQVLLHRRT 

GGGCGAGCTGGACATCACCGTGCCGACCTTCTCCGGGGGGCGCAACAACTCGCGGTTCGA 

38401 + + + + + + 38460 

CCCGCTCGACCTGTAGTGGCACGGCTGGAAGAGGCCCCCCGCGTTGTTGAGCGCCAAGCT 
23 GELDITVPTFSGGRNNSRFE- 

GGACACCGTCGGTTCCTTCATCAACTTCCTGCCGCTGCGTACCGACCTCTCCGGATGCGC 

38461 + + + + + + 38520 

C CTGTGGCAGCCAAGGAAGTAGTTGAAGGACGGCGACGCATGGCTGGAGAGGC CTACGCG 
23 DTVGSFINFLPLRTDLSGCA- 

ATCCTTCCGCGAGGTCGTGCTGCGCACCCGCACCACCTGCGGAGAGGCGTTCACCCACGA 

38521 + + + + + + 38580 

TAGGAAGGCGCTCCAGCACGACGCGTGGGCGTGGTGGACGCCTCTCCGCAAGTGGGTGCT 
23 S FREVVLRTRTTCGEAFTHE 

GCTGCCCTTCTCCCGGCTGATCCCGGAGGTGCCGGAGCTGATGGCGTCGGCGGCCTCCGA 

38581 + + + + + + 38640 

CGACGGGAAGAGGGCCGACTAGGGCCTCCACGGCCTCGACTACCGCAGCCGCCGGAGGCT 
23 LPFSRLI PEVPELMASAASD 

CAACCACCAGATCTCCGTCTTCCAGGCCGTGCACGCGCCCGCGTCCGAGGGGCCCGAGCA 

38641 + + + + + + 38700 

GTTGGTGGTCTAGAGGCAGAAGGTCCGGCACGTGCGCGGGCGCAGGCTCCCCGGGCTCGT 
23 NHQI SVFQAVHAPASEGPEQ- 

GGCCGGGGACCTGACGTACTCGAAGATCTGGGAGCGGCAGCTGTCGCAGGCGGAGGGCTC 

38701 + + + + + + 38760 

CCGGCCCCTGGACTGCATGAGCTTCTAGACCCTCGCCGTCGACAGCGTCCGCCTCCCGAG 
23 AGDLTYSKIWERQLSQAEGS- 

CGACATCCCCGACGGGGTGCTGTGGTCGATCCACATCGACCCCTCGGGCTCCATGGCCGG 

38761 + + + + + + 38820 

GCTGTAGGGGCTGCCCCACGACACCAGCTAGGTGTAGCTGGGGAGCCCGAGGTACCGGCC 
23 DIPDGVLWSIHIDPSGSMAG- 

CAGCCTCGGGTACAACACCAACCGCTTCAAGGACGAGACGATGGCGGCCTTCCTGGCCGA 

38821 + + + + + + 38880 

GTCGGAGCCCATGTTGTGGTTGGCGAAGTTCCTGCTCTGCTACCGCCGGAAGGACCGGCT 
23 SLGYNTNRFKDETMAAFLAD- 

CTACCTCGACGTGCTCGAGAACGCGGTGGCCCGGCCGGACGCCCCCTTCACCTCCTGAGA 

38881 + + + + + + 38940 

GATGGAGCTGCACGAGCTCTTGCGCCACCGGGCCGGCCTGCGGGGGAAGTGGAGGACTCT 
23-* YLDVLENAVARPDAPFTS *- 

CAGTTCCGGCGGCGGCGAACCCGCCCGAAGAAAGGAAAGCCAGTGTCCACCGTTTCCGAC 

38941 + + + + + + 39000 

GTCAAGGCCGCCGCCGCTTGGGCGGGCTTCTTTCCTTTCGGTCACAGGTGGCAAAGGCTG 
26-> M S T V S D 

ACAGCGGCCGGCTCCTCCCTGGAGGAGAAGGTCACCCGGATCTGGACGGGTGTTCTCGGC 

39001 + + + + + + 39060 

TGTCGCCGGCCGAGGAGGGACCTCCTCTTCCAGTGGGCCTAGACCTGCCCACAAGAGCCG 
26 TAAGS SLEEKVTRIWTGVLG 

ACGTCCGGTGAGGAAGGCGCGACGTTCATCGAGCTCGGAGGGCAGTCGGTCTCGGCCGTG 
39061 + + + + + + 39120 
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TGCAGGCCACTCCTTCCGCGCTGCAAGTAGCTCGAGCCTCCCGTCAGCCAGAGCCGGCAC 
TSGEEGATFIELGGQSVSAV 

CGCATCGCCACGCGTATCCAGGAGGAGCTCGACATCTGGGTCGACATCGGCGTCCTCTTC 

39121 + + + + + + 39180 

GCGTAGCGGTGCGCATAGGTCCTCCTCGAGCTGTAGACCCAGCTGTAGCCGCAGGAGAAG 
RIATRIQEELDIWVDIGVLF 

GACGACCCGGATCTGCCTACCTTCATCGCGGCGGTCGTCCGGACGGCCGACGCCGCGGGC 

39X81 + + + + + + 39240 

CTGCTGGGCCTAGACGGATGGAAGTAGCGCCGCCAGCAGGCCTGCCGGCTGCGGCGCCCG 
DDPDLPTF IAAVVRTADAAG 

GGCGAGGGCTCCGGAACGCAGTGAGACTCGCCGGGCGCCGTCTCCCCGCGGCGCCCGGTT 

39241 + + + + + + 39300 

CCGCTCCCGAGGCCTTGCGTCACTCTGAGCGGCCCGCGGCAGAGGGGCGCCGCGGGCCAA 

* GEGSGTQ*- 

TCACATGGCTGAGGCGGTTCACCCGGTACCGGGTGAACCGCCTCAGCCATGTGAAACCGG 

39301 + + + + + + 39360 

AGTGTACCGACTCCGCCAAGTGGGCCATGGCCCACTTGGCGGAGTCGGTACACTTTGGCC 

GCCTGGTCAGCGCAGCTGGATGTCCGTCTCCCGGGCGATCGCCCGGAGGAACTCGCCGCG 

39361 + + + + + + 39420 

CGGACCAGTCGCGTCGACCTACAGGCAGAGGGCCCGCTAGCGGGCCTCCTTGAGCGGCGC 
.* *RLQIDTERAIARLFEGR- 

GGACAGCGCGTCGGCGACCAGCTCGATGTCGTCGGCCATGTACCGGTCGACGCCCAGCGT 

39421 + + + + + + 39480 

CCTGTCGCGCAGCCGCTGGTCGAGCTACAGCAGCCGGTACATGGCCAGCTGCGGGTCGCA 
S LADAVLE I DDAMYRDVGLT- 

CGGAACCAGCCGGCGCACCGCTTCGTACGTGGCCTTCGCCGCCGGGCTCAAGCCGTCGAA 

39481 + + + + + + 39540 

GCCTTGGTCGGCCGCGTGGCGAAGCATGCACCGGAAGCGGCGGCCCGAGTTCGGCAGCTT 
PVLRRVAEYTAKAAP S LGD F - 

CCGGCCGGAGATGTCGACCGCCTGGGCGGCGGCCAGGTACTCCACCGCGAGGATCTTGTT 

3954! + + + + + + 39600 

GGCCGGCCTCTACAGCTGGCGGACCCGCCGCCGGTCCATGAGGTGGCGCTCCTAGAACAA 
RG S I DVAQAAAL Y EVAL I KN- 

GTTGTTCGACAGGACCCGGCGGGCGTTGCGGGCCGAGATCAGGCCCATGCTCACCACGTC 

39601 + + + + + + 39660 

CAACAAGCTGTCCTGGGCCGCCCGCAACGCCCGGCTCTAGTCCGGGTACGAGTGGTGCAG 
NNSLVRRANRAS ILGMSVVD- 

CTGGTTGTCGCCGTTGGACGGGACGCTCTGGGTGCTGGCCGGGCCGATCGTCCGGTTCTC 

39661 + + + + + + 39720 

GACCAACAGCGGCAACCTGCCCTGCGAGACCCACGACCGGCCCGGCTAGCAGGCCAAGAG 

QNDGNS PVSQTSAPGITRNE- 

GGCCACCAGTGCGGTGGCCGGGTACTGGGCGCCGGCGAATCCGCTGTGCAGCCCCGGGTC 

39721 + + + + + + 39780 

CCGGTGGTCACGCCACCGGCCCATGACCCGCGGCCGCTTAGGCGACACGTCGGGGCCCAG 
AVLATAPYQAGAFGSHLGPD- 

CCCGGAGACGAGGAACTCCGGGAGGCCGTAGCTGAGGTGCCGGTTCAGGACCCGGTTGAT 

39781 + + + + + + 39840 

GGGCGTCTGCTCCTTGAGGCCCTCCGGCATCGACTCCACGGCCAAGTCCTGGGCCAACTA 
G SVLFE PLGYS LHRNLVRN I - 

CTGCCGCTCGGCCAGGACGCCGAGCTGGGTGAGCGCGATGGTCACGAAGTCCATCGCGAA 

39841 + + + + + + 39900 

GACGGCGAGCCGGTCCTGCGGCTCGACCCACTCGCGCTACCAGTGCTTCAGGTAGCGCTT 
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24 QREALVGLQTLAI TVFDMAF- 

CGCGATCGGCTGACCGTGGAAGTTCGCCCCGTGGAAGATCTCCTTGCCCTCGAAGAAGAG 

399OI + + + + + + 39960 

GCGCTAGCCGACTGGCACCTTCAAGCGGGGCACCTTCTAGAGGAACGGGAGCTTCTTCTC 
24 AI PQGHFNAGHFI EKGEFFL- 

CGGGTTGTCGTTGGCCGAGTTGAGCTCGATGCGCAGCTTGTGCCGCGCGTGGTACAAGGT 

39961 + + + + + + 40020 

GCCCAACAGCAACCGGCTCAACTCGAGCTACGCGTCGAACACGGCGCGCACCATGTTCCA 
24 PNDNASNLE I RLKHRAHYLT- 

GTCGCGCACCGCCCCGACGACCTGGGGGATGGCCCGCAGCGAGTAGGCCTTCTGCAGGTA 

40021 + + + + + + 40080 

CAGCGCGTGGCGGGGCTGCTGGACCCCCTACCGGGCGTCGCTCATCCGGAAGACGTCCAT 
24 DRVAGVVQP IARL SYAKQLY- 

GATCTCCGAGCGCTGGACGTCCTTGCCGGCCTCCTTGTCCTTCTGGAGTTCTCGGCGCAG 

40081 + + + + + + 40140 

CTAGAGGCTCGCGACCTGCAGGAACGGCCGGAGGAACAGGAAGACCTCAAGAGCCGCGTC 
24 IESRQVDKGAEKDKQLERRL- 

GTCGGCGTGCTCGACCGTCAGTCCGCTGCCCCGCATCAGGGCCCGCATGTTGGCGGCGGT 

40141 + + + + + + 40200 

CAGCCGCACGAGCTGGCAGTCAGGCGACGGGGCGTAGTCCCGGGCGTACAACCGCCGCCA 
24 DAHEVTLGSGRMLARMNAAT- 

GTCGATCTGGCCCTCGTGCGGGCGGGCTATGTCGTGCCCCTCCGCGAGGAAGGGGCTGGT 

40201 + + + + + + 40260 

CAGCTAGACCGGGAGCACGCCCGCCCGATACAGCACGGGGAGGCGCTCCTTCCCCGACCA 
24 DIQGEHPRAIDHGEALFPST- 

CGATCCGCGTACCGCCTCGATGAGCAGAGCCGTCACGATCTCGGCCTGCTGGGCCTGCTC 

40261 + + + + + + 40320 

GCTAGGCGCATGGCGGAGCTACTCGTCTCGGCAGTGCTAGAGCCGGACGACCCGGACGAG 
24 SGRVAE I LLATV I EAQQAQE- 

CAGGGCCCGTCCGACGACCAGGGAGCCCAGACCGGTCATCCCGGACGTGCCGTTGATCAG 

40321 + + + + + + 40380 

GTCCCGGGCAGGCTGCTGGTCCCTCGGGTCTGGCCAGTAGGGCCTGCACGGCAACTAGTC 
24 LARGVVLSGLGTMGSTGNIL- 

TGCGAGGCCCTCCTTGAAGCGCAGTTCGAGCGGCTCGATGCCCCGCTCGGCCAGCACCTG 

40381 + + + + + + 40440 

ACGCTCCGGGAGGAACTTCGCGTCAAGCTCGCCGAGCTACGGGGCGAGCCGGTCGTGGAC 
24 ALGEKFRLELPE I GREALVQ- 

GGCGGTCTCCACCGGCCGTCCGTCGCGCAGGACGTAGCCCTCTCCGATGAGGGTGCTCGC 

40441 + + + + + + 40500 

CCGCCAGAGGTGGCCGGCAGGCAGCGCGTCCTGCATCGGGAGAGGCTACTCCCACGAGCG 
24 ATEVPRGDRLVYGEGI LTSA- 

GACGTGGGAGAGGGGAGCCAGGTCGCCGCTCGCCCCGAGTGACCCGATCTCGGGTATGGC 

40501 + + + + + + 40560 

CTGCACCCTCTCCCCTCGGTCCAGCGGCGAGCGGGGCTCACTGGGCTAGAGCCCATACCG 
24 VHSL PALDGSAGLSG I EP I A - 

CGGGGTGATGCCCTCGTTCAGGTACTGCGCGAGGCGTTCGAGGATGATGGGGCGCACCGC 

40561 + + + + + + 40620 

GCCCCACTACGGGAGCAAGTCCATGACGCGCTCCGCAAGCTCCTACTACCCCGCGTGGCG 

24 PT I GENLYQALREL I I PRVA- 

GGAGTGGCCCTTGGCGAGGGTGTTCAGCCGGGCGGCGACGATCGCCCGCGCCTCGTCCTC 

4062 i + + + + + + 40680 

CCTCACCGGGAACCGCTCCCACAAGTCGGCCCGCCGCTGCTAGCGGGCGCGGAGCAGGAG 
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24 S HGKALTNLRAAV IARAEDE- 

GGCGAACAGCGGACCGACTCCCGCGCTGTGGCTACGGACGAGATTGGTCTGCAGTTCGAC 

40681 + + + + + + 40740 

CCGCTTGTCGCCTGGCTGAGGGCGCGACACCGATGCCTGCTCTAACCAGACGTCAAGCTG 
24 AFLPGVGAS HSRVLNTQLEV- 

TTCCTTCGACTTGTCGACCTGCATGTAGATCATCTCGCCGTACCCGGTGGTCACCCCGTA 

40741 + + + + + + 40800 

AAGGAAGCTGAACAGCTGGACGTACATCTAGTAGAGCGGCATGGGCCACCAGTGGGGCAT 
24 E KS KDVQMY I MEGYGTTVGY- 

GATGGGGATGTTCTGTTCGGCGATCCCTTCGAAGATCTCCCGGCTCTTCTGGGCCTTCGC 

40801 + + + + + + 40860 

CTACCC CTACAAGACAAGC CGCTAGGGAAGCTTCTAGAGGGCCGAGAAGAC CCGGAAGCG 
24 I PINQEAIGEFIERSKQAKA- 

GATGGATTCGGCCGGTAGGTCGACCGTCGCGCGTTCCTCCGCGACGCGGCGTACGGCTTC 

40861 + + + + + + 40920 

CTACCTAAGCCGGCCATGCAGCTGGCAGCGCGCAAGGAGGCGCTGCGCCGCATGCCGAAG 
24 I S EAPVDVTARE EAVRRVAE- 

GACGGTCAGGGTCTCGCCGTCGACGGAAACCGGGACGATCTCGGTCTCGACTTGAGTCAA 

40921 + + + + + + 40980 

CTGCCAGTCCCAGAGCGGCAGCTGCCTTTGGCCCTGCTAGAGCCAGAGCTGAACTCAGTT 

24 VTLTEGDVSVPVI ETEVQTL- 

TGCCATCACTCCATGGGTAGCGGCCGAGGCCGGTGTACGACAGGTCAGGGGGTGGGTTCG 

40981 + + + + + + 41040 

ACGGTAGTGAGGTACCCATCGCCGGCTCCGGCCACATGCTGTCCAGTCCCCCACCCAAGC 

24- < A M - 

TGAGGCGCGGCTCAGCGGGTGAGCCGGGAGCGGTCCACCTTCCCCGCGGCGTTGCGCGGC 

41041 + + + + + + 41100 

ACTCCGCGCCGAGTCGCCCACTCGGCCCTCGCCAGGTGGAAGGGGCGCCGCAACGCGCCG 

25- * *RTLRSRDVKGAANRP - 

AGGCGTGAAGTCAGGCGGGTGAAGACGGCGGGCAGTGCGAGGGGGCCGAACTGGCCGCGC 

41101 + + + + + + 41160 

TCCGCACTTCAGTCCGCCCACTTCTGCCGCCCGTCACGCTCCCCCGGCTTGACCGGCGCG 

25 LRSTLRTFVAPLALPGFQGR- 

AGATGGGAACGCCAGGCCCGGATGTCCGCGCGCACGTCCTCCCGGCCCTCTCCTTGTGGC 

41161 + + + + + + 41220 

TCTACCCTTGCGGTCCGGGCCTACAGGCGCGCGTGCAGGAGGGCCGGGAGAGGAACACCG 
25 LHSRWARIDARVDERGEGQP 

ACCACGTACACGGCGAGGCGGGTCACCAGGCCCTGGCCGTTGACGTGGGGGAGGACCGCG 

41221 + + + + + + 41280 

TGGTGCATGTGCCGCTCCGCCCAGTGGTCCGGGACCGGCAACTGCACCCCCTCCTGGCGC 
25 VVYVALRTVLGQGNVHPLVA- 

CACTCCAGGACCGAGGGGTCACGGTTCAGCGCGGCCTCGATCTCGGTGAGTTCCAAGCGG 

41281 + + + + + + 41340 

GTGAGGTCCTGGCTCCCCAGTGCCAAGTCGCGCCGGAGCTAGAGCCACTCAAGGTTCGCC 
25 CELVS PDRNLAAE I ETLELR- 

TTCCCGAACAGCTTGACCTGGAAGTCCTTGCGGCCCCGGAATTCCAGGGCTCCGTCGAAC 

41341 + + + + + + 41400 

AAGGGCTTGTCGAACTGGACCTTCAGGAACGCCGGGGCCTTAAGGTCCCGAGGCAGCTTG 
25 NGFLKVQFDKRGRFELAGDF- 

CGTACCCGCGCCAGATCCCCGGTCCGGTACCACCGGTCACCGTCCGGGGCGAGGCCGGCG 

41401 + + + + + + 41460 

GCATGGGCGCGGTCTAGGGGCCAGGCCATGGTGGCCAGTGGCAGGCCCCGCTCCGGCCGC 
25 RVRALDGTRYWRDGD PALGA- 
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AGGGGCGCGAACAGCGCGCTGTGGTCCGGGCCGCCCTCGACGGCGAGATAACCCGGCGTC 

41461 + + + + + + 41520 

TCCCCGCGCTTGTCGCGCGACACCAGGCCCGGCGGGAGCTGCCGCTCTATTGGGCCGCAG 
LPAFLASHDPGGEVALYGPT- 

ACGTACGGGGAGCGGATCACCAGTTCGCCGGTGACGCCGGCGGGGCTCGGCCGGTCGTCC 

41521 + + + + + + 41580 

TGCATGCCCCTCGCCTAGTGGTCAAGCGGCCACTGCGGCCGCCCCGAGCCGGCCAGCAGG 
VYPSRIVLEGTVGAPSPRDD 

GCGTCCACGACGAGTACCTGGCGGCCGGGGAGCGGGTACCCGATCGGGGCCGGGCCCGTG 

41581 + + + + + + 41640 

CGCAGGTGCTGCTCATGGACCGCCGGCCCCTCGCCCATGGGCTAGCCCCGGCCCGGGCAC 
ADVVLVQRGPLPYGI PAPGT 

ACCGGCCCGGTGATCTCGTGCCAGGTCGCGGCGATCGTCTCGGTGGGCCCGTAGAGGTTG 

41641 + + + + + + 41700 

TGGCCGGGCCACTAGAGCACGGTCCAGCGCCGCTAGCAGAGCCACCCGGGCATCTCCAAC 
VPGTIEHWTAAITETPGYLN- 

ATCAGGCGGGTCCGGGGCAGGGCCGCGCGCAGTCCGTCCACGAGTTCGCCGGGCAGCGCC 

41701 + + + + + + 41760 

TAGTCCGCCCAGGCCCCGTCCCGGCGCGCGTCAGGCAGGTGCTCAAGCGGCCCGTCGCGG 
ILRTRPLAARLGDVLEGPLA- 

TCGCCCATCAGGAGCAGGTGGCCCAGGGTGCCGGGCCGATCGCCCGGGTCGGAGGCGGTG 

41761 + + + + + + 41820 

AGCGGGTAGTCCTCGTCCACCGGGTCCCACGGCCCGGCTAGCGGGCCCAGCCTCCGCCAC 
EGMLLLHGLTGPRDGPDSAT- 

ATCACTCCCAGGAGGTCCCGGGCGAAGCTGGGCACGGTCTGGAGATGAGTGATCCGCTCC 

41821 + + + + + + 41880 

TAGTGAGGGTCCTCCAGGGCCCGCTTCGACCCGTGCCAGACCTCTACTCACTAGGCGAGG 
IVGLLDRAFSPVTQLHTIRE- 

TGGACGAGCCACGGCACCAGCTTGTCGGGGTTCACCCTGACGCGCTCCGGCACCGGACAC 

41881 + + + + + + 41940 

ACCTGCTCGGTGCCGTGGTCGAACAGCCCCAAGTGGGACTGCGCGAGGCCGTGGCCTGTG 

QVLWPVLKDPNVRVREPVPC- 

AGCGTCCCGCCGGCCACGAGCGTCGCGAAGACCTCGGCCAGCGCCGGGTCGTGCTCCGGG 

41 941 + + + + + + 42000 

TCGCAGGGCGGCCGGTGCTCGCAGCGCTTCTGGAGCCGGTCGCGGCCCAGCACGAGGCCC 
LTGGAVLTAFVEALAPDHEP - 



SEQ ID No. 2. C-1027 gene cluster DNA sequence from 41,980 to 63,164 

AGCGCCGGGTCGTGCTCCGGGGAGACCCACTGCGCCACCCGCGCGCCCGGCCCCATCGCG 

41980 + + + + + + 42039 

TCGCGGCCCAGCACGAGGCCCCTCTGGGTGACGCGGTGGGCGCGCGGGCCGGGGTAGCGC 

LAPDHEPSVWQAVRAGPGMA- 

AACCGTTCGCCCATCCAGCCCGCGAACTGGCCCAGCGCGGCATGCGACTGGGCGATCCCC 

42040 + + + + + + 42099 

TTGGCAAGCGGGTAGGTCGGGCGCTTGACCGGGTCGCGCCGTACGCTGACCCGCTAGGGG 
FREGMWGAFQGLAAHSQAIG- 

TTGGGCCGCCCGGTCGAACCCGAGGTGAACGCCACGTAGGCCAGGTCTGCCAGGCCCGGC 

42100 + + + + + + 42159 

AACCCGGCGGGCCAGCTTGGGCTCCACTTGCGGTGCATCCGGTCCAGACGGTCCGGGCCG 
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25 KPRGTSGSTFAVYALDALGP- 

CCCGCCGCGGTCGTCGCGTCCGGGCCGGCGGCGGGTCGAGGGCCGAGCACAGAGGAGGCG 

42160 + + + + + + 42219 

GGGCGGCGCCAGCAGCGCAGGCCCGGCCGCCGCCCAGCTCCCGGCTCGTGTCTCCTCCGC 
25 GAATTADPGAAPRPGLVSSA- 

TCCAGCAGGGTGGCGCCCGGTTCACCGGCGTACCAGAGCGCCAGCGGATCCTCCTGCGGA 

42220 + + + + + + 42279 

AGGTCGTCCCACCGCGGGCCAAGTGGCCGCATGGTCTCGCGGTCGCCTAGGAGGACGCCT 
25 DLLTAGPEGAYWLALPDEQP 

TCGCCGTCGAGGACCAGGCACGCCGGGCGCAGATCGCTGAGCATCGACCGGTGTCGTTCG 

42280 + + + + + + 42339 

AGCGGCAGCTCCTGGTCCGTGCGGCCCGCGTCTAGCGACTCGTAGCTGGCCACAGCAAGC 
25 DGDLVLCAPRLDSLMSRHRE- 

CCCGCGCCGTCCGGAGCGAACCACGCCAGGTGGGCGCCCGCCTCCAGGACTCCCAGCAGC 

42340 + + + + + + 42399 

GGGCGCGGCAGGCCTCGCTTGGTGCGGTCCACCCGCGGGCGGAGGTCCTGAGGGTCGTCG 
25 GAGDPAFWALHAGAELVGLL- 

ACCGCGATCCGGCGGGCGCCCGGCTGCATCCGCACCGCCACCGGCGAGCCGTGCCCCGCG 

42400 + + + + + + 42459 

TGGCGCTAGGCCGCCCGCGGGCCGACGTAGGCGTGGCGGTGGCCGCTCGGCACGGGGCGC 
25 VAIRRAGPQMRVAVPSGHGA- 

CCGGCCGCGGTGAGGGCCGAGGCGACGCGGGCCGCGTCCGCGGTCAGTTCGGCGGTCAGT 

42460 + + + + + + 42519 

GGCCGGCGCCACTCCCGGCTCCGCTGCGCCCGGCGCAGGCGCCAGTCAAGGCGCCAGTCA 
25 GAATLASAVRAADATLEATL 

TCGGCGTAGCTTGTGCGCGTGCCGCCGAACGAGACGGCGACACCGTCGTGTTCCGCGTGG 

42520 + + + + + + 42579 

AGCCGCATCGAACACGCGCACGGCGGCTTGCTCTGCCGCTGTGGCAGCACAAGGCGCACC 
25 EAYSTRTGGFSVAVGDHEAH 

CGGCGGACCGAGGCGTGCACCGGCCGCGTCATGTCCCCGCCGGACGCCCGGCGGTCCGAA 

42580 + + + + + + 42639 

GCCGCCTGGCTCCGCACGTGGCCGGCGCAGTACAGGGGCGGCCTGCGGGCCGCCAGGCTT 
25 RRVSAHVPRTM (ORF25) 

GCGCGCAGGGCGTGGTCCCGGTGGCGGTCGTCGTCCAGCGGCAGAGCGCCCACGGGTGTG 

42640 + + + + + + 42699 

CGCGCGTCCCGCACCAGGGCCACCGCCAGCAGCAGGTCGCCGTCTCGCGGGTGCCCACAC 

TCCGGATCCGTGGTCGCGGCGGTCAGGAGGACGGCCAGCTGATCCAGCATCCGCCGGGCC 

42700 + + + + + + 42759 

AGGCCTAGGCACCAGCGCCGCCAGTCCTCCTGCCGGTCGACTAGGTCGTAGGCGGCCCGG 

GAAGCGGGCTCGAACAGAGCTTCGCGGTACTCCAGGTAGCCGGTGACCGAGGGCGCGGTG 

42760 + + + + + + 42819 

CTTCGCCCGAGCTTGTCTCGAAGCGCCATGAGGTCCATCGGCCACTGGCTCCCGCGCCAC 

TCCTGCAGCACCAGGGTCAGGTCGGCGGCGGCAGTGCCGTTGTGCACGGACAGCCGCCTC 

42820 + + + + + + 42879 

AGGACGTCGTGGTCCCAGTCCAGCCGCCGCCGTCACGGCAACACGTGCCTGTCGGCGGAG 

ACCTCGGCGCCTGGTATCCGCAGGCCCGGCCGCTCCTCGTGGACGAACACGGCGTCGGCC 

42880 + + + + + + 42939 

TGGAGCCGCGGACCATAGGCGTCCGGGCCGGCGAGGAGCACCTGCTTGTGCCGCAGCCGG 

CCCTCGATCCGGCACGGCCCGGGGGCCGGGGCCGGCGTCGTGTGCAGCAGCTCCCGGAAG 

42940 + + + + + + 42999 

GGGAGCTAGGCCGTGCCGGGCCCCCGGCCCCGGCCGCAGCACACGTCGTCGAGGGCCTTC 
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GCGGTGGCCGGCGTGCCGTCGTCCTGTCCGGCGTAGCGCTGGACCAGGGCTCGGAATCCG 

43000 + + + + + + 43059 

CGCCACCGGCCGCACGGCAGCAGGACAGGCCGCATCGCGACCTGGTCCCGAGCCTTAGGC 

GCCAGCACCACGGCCGCGGCGGTGACCCCTTCCGCTTCGGCGAGCCGGGCCGTACGGAAG 

43060 + + + + + + 43119 

CGGTCGTGGTGCCGGCGCCGCCACTGGGGAAGGCGAAGCCGCTCGGCCCGGCATGCCTTC 

CCGAGGTCCGGACTCCAGCCGAAGGCGACGGTGCTCCCCGCGTGCGAGGGCAGGTGCGGG 

43120 + + + + + + 43179 

GGCTCCAGGCCTGAGGTCGGCTTCCGCTGCCACGAGGGGCGCACGCTCCCGTCCACGCCC 

CGGTTCCGGTCGGCGGGCAGGACCTGTCCGGAGGCGGTCGCCGAAGACTCCTCGCTCCCG 

43180 + + + + + + 43239 

GCCAAGGCCAGCCGCCCGTCCTGGACAGGCCTCCGCCAGCGGCTTCTGAGGAGCGAGGGC 

GGCGCCCGGGGCGTTTGCGGCGCGGGCGCAGTGGGAGGCCGGCCGCCGGTGGTGACGGCG 

43240 + + + + + + 43299 

CCGCGGGCCCCGCAAACGCCGCGCCCGCGTCACCCTCCGGCCGGCGGCCACCACTGCCGC 

AGGTACGCGTTCGACAACGCGGCCGGCAGGGGCCCGGACGGCCCGTCCCAGGCTCCGGAG 

43300 + + + + + + 43359 

TCCATGCGCAAGCTGTTGCGCCGGCCGTCCCCGGGCCTGCCGGGCAGGGTCCGAGGCCTC 

TGCGAGGCCACCAGGAGAAGCAGGTGCGCGCGTGGGCCTCTGCGGGCGATGTGGAGCCGT 

43360 + + + + + + 43419 

ACGCTCCGGTGGTCCTCTTCGTCCACGCGCGCACCCGGAGACGCCCGCTACACCTCGGCA 

GCGGGCGCGTCACCCTCGGCGAAGGGACGGGCCGCCCAGCGAGCGCAGAGTTCCTCCTCC 

43420 + + + + + + 43479 

CGCCCGCGCAGTGGGAGCCGCTTCCCTGCCCGGCGGGTCGCTCGCGTCTCAAGGAGGAGG 

CCGCACTCCTCGTCGGCACTCGGCCCGTCCACGGCGGCCCCGTCTCCGGCGGCGGCCCGC 

43480 + + + + + + 43539 

GGCGTGAGGAGCAGCCGTGAGCCGGGCAGGTGCCGCCGGGGCAGAGGCCGCCGCCGGGCG 

CAGGCCGTCCGCAGGGCCTCCAGGTCGAGTCCGCCGCTCACGTGGTAGGCCGCGTACGGG 

43540 + + + + + + 43599 

GTCCGGCAGGCGTCCCGGAGGTCCAGCTCAGGCGGCGAGTGCACCATCCGGCGCATGCCC 

TGCAACACCGCAGATCCGGAGGCCGGCGAAGGCCCCCGGTCCGGCTCGGTCACAGTCACG 

43600 + + + + + + 43659 

ACGTTGTGGCGTCTAGGCCTCCGGCCGCTTCCGGGGGCCAGGCCGAGCCAGTGTCAGTGC 

TCATTCGCCACGACGCCCATCTTGGGGCGGCGGCGCACAGGACGCTTCTCCTTGAGTGCG 

43660 + + + + + + 43719 

AGTAAGCGGTGCTGCGGGTAGAACCCCGCCGCCGCGTGTCCTGCGAAGAGGAACTCACGC 

GAGCTCCGCGTACGGCGCCGAAGCGTTCGGTCAAACCTTGTTCGACCAACTGCGCAATCT 

43720 + + + + + + 43779 

CTCGAGGCGCATGCCGCGGCTTCGCAAGCCAGTTTGGAACAAGCTGGTTGACGCGTTAGA 

GGAAGTTGACGTCTTCCAGGTGGAGTTGGGAACGATGGAGGCCCCCGCCGGCCGCGTCGG 

43780 + + + + + + 43839 

CCTTCAACTGCAGAAGGTCCACCTCAACCCTTGCTACCTCCGGGGGCGGCCGGCGCAGCC 

AACGGCCGTGCAGTGCGGCCCTCTCCAACACTCCCGGCCATCGCGGAATCCGAGACGTGC 

43840 + + + + + + 43899 

TTGCCGGCACGTCACGCCGGGAGAGGTTGTGAGGGCCGGTAGCGCCTTAGGCTCTGCACG 

CCGAAGGAGCCCCCCTTGCAAGCCTGGTTCAAGCGCACCAGTGGTGTGCCCGGTGACAGA 

43900 + + + + + + 43959 

GGCTTCCTCGGGGGGAACGTTCGGACCAAGTTCGCGTGGTCACCACACGGGCCACTGTCT 
27 (ORF27) V P G D R 
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CGTGGAAAGTGGCTGGTCCTGGCCGCCTGGCTCATCATCGCGATGGCGCTGGGCCCGCTG 

43960 + + + + + + 44019 

GCACCTTTCACCGACCAGGACCGGCGGACCGAGTAGTAGCGCTACCGCGACCCGGGCGAC 
27 RGKWLVLAAWLI IAMALGPL 

GCGGGGAAGCTCGCCGACGTCCAGGACTCCAGCGCCAACGCCTTCCTTCCGCGCAGCTCG 

44020 + + + + + + 44079 

CGCCCCTTCGAGCGGCTGCAGGTCCTGAGGTCGCGGTTGCGGAAGGAAGGCGCGTCGAGC 
27 AGKLADVQDS SANAFLPRS S 

GAGTCCGCGAAGCTGAACAAGGAACTGGAGAAGTTCCGCGCCGACGAGCTGATGCCGGCC 

44080 + + + + + + 44139 

CTCAGGCGCTTCGACTTGTTCCTTGACCTCTTCAAGGCGCGGCTGCTCGACTACGGCCGG 
27 ESAKLNKELEKFRADELMPA 

GTGGTGGTCTACAGCGCCGACGGCTCGCTGCCCGCCGAGGGGCGGGCCAAGGCCGAGAAG 

44140 + + + + + + 44199 

CACCACCAGATGTCGCGGCTGCCGAGCGACGGGCGGCTCCCCGCCCGGTTCCGGCTCTTC 
27 VVVYSADGSLPAEGRAKAEK 

GACATAGC CGCCTTCCAGGAGCTGGCCGC CGAGGGCGAGAAGGTCGAAGCGCCCCTGGAG 

44200 + + + + + + 44259 

CTGTATCGGCGGAAGGTCCTCGACCGGCGGCTCCCGCTCTTCCAGCTTCGCGGGGACCTC 
27 DIAAFQELAAEGEKVEAPLE 

TCGGAGGACGGCCAGGCGCTCATGGTCGTCGTTCCGCTGATCAGCGACGCCGACATCGTC 

44260 + + + + + + 44319 

AGCCTCCTGCCGGTCCGCGAGTACCAGCAGCAAGGCGACTAGTCGCTGCGGCTGTAGCAG 
27 SEDGQALMVVVPLI SDADIV 

GCCACGACGAAGAAGGTCCGCGATGTCGCGGACGCCAACGCCCCCCCGGGCGTCGCCATC 

44320 + + + + + + 44379 

CGGTGCTGCTTCTTCCAGGCGCTACAGCGCCTGCGGTTGCGGGGGGGCCCGCAGCGGTAG 
27 ATTKKVRDVADANAPPGVAI 

GAGGTGGGCGGGCCCGCCGGGTCGACGACCGACGCCGCCGGCGCTTTCGAGTCCCTCGAC 

44380 + + + + + + 44439 

CTCCACCCGCCCGGGCGGCCCAGCTGCTGGCTGCGGCGGCCGCGAAAGCTCAGGGAGCTG 
27 EVGGPAGSTTDAAGAFESLD 

TCCATGCTGATGATGGTCACCGGCCTTGTGGTCGCCATCCTGCTGCTGATCACCTACCGC 

44440 + + + + + + 44499 

AGGTACGACTACTACCAGTGGCCGGAACACCAGCGGTAGGACGACGACTAGTGGATGGCG 
27 SMLMMVTGLVVAILLLITYR 

TCCCCCATCCTGTGGCTGCTGCCCCTGCTCTCCGTCGGCTTCGCCTCCGTGCTGACCCAG 

44500 + + + + + + 44559 

AGGGGGTAGGACACCGACGACGGGGACGAGAGGCAGCCGAAGCGGAGGCACGACTGGGTC 
27 SPILWLLPLLSVGFASVLTQ 

GTCGGCACCTACATGCTCGCCAAGTACGCCGGGCTGCCGGTCGACCCGCAGAGCTCCGGC 

44560 + + + + + + 44619 

CAGCCGTGGATGTACGAGCGGTTCATGCGGCCCGACGGCCAGCTGGGCGTCTCGAGGCCG 
27 VGTYMLAKYAGLPVDPQSSG 

GTCCTGATGGTCCTCGTGTTCGGTGTCGGCACCGACTACGCCCTGCTGCTCATCGCCCGC 

44620 + + + + + + 44679 

CAGGACTACCAGGAGCACAAGCCACAGCCGTGGCTGATGCGGGACGACGAGTAGCGGGCG 
27 VLMVLVFGVGTDYALLLIAR 

TACCGTGAGGAACTGCGCCGCGAGCAGGACCGGCACGTGGCCATGAAGACCGCGTTGCGA 

44680 + + + + + + 44739 

ATGGCACTCCTTGACGCGGCGCTCGTCCTGGCCGTGCACCGGTACTTCTGGCGCAACGCT 
27 YREELRREQDRHVAMKTALR 

CGGTCGGGCCCGGCCATCCTGGCCTCGGCCGGCACCATCGCCATCGGCCTCGTCTGCCTG 
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44740 + + + + + + 44799 

GCCAGCCCGGGCCGGTAGGACCGGAGCCGGCCGTGGTAGCGGTAGCCGGAGCAGACGGAC 
27 RSGPAILASAGTIAIGLVCL 

GTCCTCGCGGACGTCAACTCCTCCCGCTCCATGGGCCTGGTCGGCGCGATCGGCGTGGTC 

44800 + + + + + + 44859 

CAGGAGCGCCTGCAGTTGAGGAGGGCGAGGTACCCGGACCAGCCGCGCTAGCCGCAGCAG 
27 VLADVNS SRSMGLVGAI GVV 

TGCGCCCTCCTCGCCATGGTCACGATCCTGCCCGCGCTGCTGGTCATCCTGGGCCGCTGG 

44860 + + + + + + 44919 

ACGCGGGAGGAGCGGTACCAGTGCTAGGACGGGCGCGACGACCAGTAGGACCCGGCGACC 
27 CALLAMVTILPALLVILGRW 

GTGTTCTGGCCCTTCGTTCCCCGCTGGACGCCGGAGTCGGCCGCGGCCCCCGAGGCACCG 

44920 + + + + + + 44979 

CACAAGACCGGGAAGCAAGGGGCGACCTGCGGCCTCAGCCGGCGCCGGGGGCTCCGTGGC 
27 VFWPFVPRWTPESAAAPEAP 

GCGTCCCACAGCCGCTGGGAGCGCATCGGCTCCGTCACGGCCGCCCGGCCGCGCCGCGCC 

44980 + + + + + + 45039 

CGCAGGGTGTCGGCGACCCTCGCGTAGCCGAGGCAGTGCCGGCGGGCCGGCGCGGCGCGG 
27 ASHSRWERIGSVTAARPRRA 

TGGGTGCTGTCCTTGGCCGCGACGGGGCTTCTCGCCCTCAGTTCCCTCGGCCTCGACATG 

45040 + + + + + + 45099 

ACCCACGACAGGAACCGGCGCTGCCCCGAAGAGCGGGAGTCAAGGGAGCCGGAGCTGTAC 
27 WVLSLAATGLLALS S LGLDM 

GGACTCACCCAGAGCGAACTGCTCCAGACGAAGCCCGAGTCCGTCGTCGCCCAGGAGCGG 

45100 + + + + + + 45159 

CCTGAGTGGGTCTCGCTTGACGAGGTCTGCTTCGGGCTCAGGCAGCAGCGGGTCCTCGCC 
27 GLTQSELLQTKPESVVAQER 

ATCTCCGCCCACTACCCGTCCGGCTCCTCCGACCCCGCCACCGTCGTCGCACCCAGCGCG 

45160 + + + + + + 45219 

TAGAGGCGGGTGATGGGCAGGCCGAGGAGGCTGGGGCGGTGGCAGCAGCGTGGGTCGCGC 
27 ISAHYPSGSSDPATVVAPSA 

GACGTGGCCGAGGTCCGCCGGGCCGCCGAGGGGACCGACGGAGTGGTCTCCGTCCAGGAC 

45220 + + + + + + 45279 

CTGCACCGGCTCCAGGCGGCCCGGCGGCTCCCCTGGCTGCCTCACCAGAGGCAGGTCCTG 
27 DVAEVRRAAEGTDGVVSVQD 

GGCCCCACCACTCCCGACGGAGAGCTGACCATGCTGTCCGTGGTGCTGAAGGACGTTCCC 

45280 + + + + + + 45339 

CCGGGGTGGTGAGGGCTGCCTCTCGACTGGTACGACAGGCACCACGACTTCCTGCAAGGG 
27 GPTTPDGELTMLSVVLKDVP 

GACAGCAGCGGGGCCAAGGACACCATCGATGCACTGCGGGACAACACGGATGCTCTCGTG 

45340 + + + + + + 45399 

CTGTCGTCGCCCCGGTTCCTGTGGTAGCTACGTGACGCCCTGTTGTGCCTACGAGAGCAC 
27 DSSGAKDTIDALRDNTDALV 

GGGGGTACGACGGCCCAGAGCCTGGACACCCAGCGCGCCTCGGTCCGTGACCTCTGGGTC 

45400 + + + + + + 45459 

CCCCCATGCTGCCGGGTCTCGGACCTGTGGGTCGCGCGGAGCCAGGCACTGGAGACCCAG 
27 GGTTAQSLDTQRASVRDLWV 

ACCGTCCCCGCGGTCCTGCTGGTGGTCCTGCTCGTCCTGATCTGGCTGCTGCGCTCGGTC 

45460 + + + + + + 45519 

TGGCAGGGGCGCCAGGACGAC CAC CAGGACGAGCAGGACTAGAC CGACGACGCGAGCCAG 
27 TVPAVLLVVLLVLIWLLRSV 

ACCGGACCGCTGATCATGCTCGGCACCGTGGTCGTGTCGTTCTTCGCGGCCCTGGGGGCG 
45520 + + + + + + 45579 
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TGGCCTGGCGACTAGTACGAGCCGTGGCAC CAGCACAGCAAGAAGCGC CGGGACCCCCGC 
27 TGPL I MLGTVVVS FFAALGA 

TCCAACCTGCTCTTCGAGTACGTGATGGGGCACGCCGGCGTCGACTGGTCGGTGCCGCTT 

45580 + + + + + + 45639 

AGGTTGGACGAGAAGCTCATGCACTACCCCGTGCGGCCGCAGCTGACCAGCCACGGCGAA 

27 SNLLFEYVMGHAGVDWSVPL 

CTCGGGTTCGTGTACCTGGTCGCCCTCGGAATCGACTACAACATCTTCCTCATGCACCGG 

45640 + + + + + + 45699 

GAGCCCAAGCACATGGACCAGCGGGAGCCTTAGCTGATGTTGTAGAAGGAGTACGTGGCC 
27 LGFVYLVALGIDYNI FLMHR 

GTGAAGGAGGAGGTCGCTCTGCACGGCCATGCCAAGGGCGTGCTCACCGGCCTGACCACC 

45700 + + + + + + "-- 45759 

CACTTCCTCCTCCAGCGAGACGTGCCGGTACGGTTCCCGCACGAGTGGCCGGACTGGTGG 
27 VKEEVALHGHAKGVLTGLTT 

ACCGGGGGCGTCATCACCAGTGCCGGCGTGGTCCTGGCCGCGACGTTCGCCGTCATCGCC 

45760 + + + + + + 45819 

TGGCCCCCGCAGTAGTGGTCACGGCCGCACCAGGACCGGCGCTGCAAGCGGCAGTAGCGG 

27 TGGVI TSAGVVLAATFAVIA 

ACACTGCCGCTGGTCCCGATGGCCCAGATGGGTGTCGTGGTCGGCCTGGGCATTCTGCTG 

45820 + + + + + + 45879 

TGTGACGGCGACCAGGGCTACCGGGTCTACC CACAGCACCAGCCGGACC CGTAAGACGAC 
27 TLPLVPMAQMGVVVGLGILL 

GACACCTTCCTCGTCCGGACGATTCTTCTGCCGGCCCTGGCGCTCGATCTGGGGCCCCGG 

45880 + + + + + + 45939 

CTGTGGAAGGAGCAGGCCTGCTAAGAAGACGGCCGGGACCGCGAGCTAGACCCCGGGGCC 
27 DTFLVRTILLPALALDLGPR 

TTCTGGTGGCCGGGCGCGCTGTCGAAGACGTCCGGGGGACCGGCCCCCGTCCGCGAGGAC 

45940 + + + + + + 45999 

AAGACCACCGGCCCGCGCGACAGCTTCTGCAGGCCCCCTGGCCGGGGGCAGGCGCTCCTG 
27 FWWPGALSKTSGGPAPVRED 

CGCACGTCCCAGCCCGTGGGCTGAGACCCGTCCCGACGAGACCCGTACGGCGGGCGGCCG 

46000 + + + + + + 46059 

GCGTGCAGGGTCGGGCACCCGACTCTGGGCAGGGCTGCTCTGGGCATGCCGCCCGCCGGC 

27 RTSQPVG* (ORF27) 

GTTCCCCCGGGCCGTACGACTGAGCAACCCAGAAGATGGGCCGCCCGCGACCAGGCGTCA 

46060 + + + + + + 46119 

CAAGGGGGCCCGGCATGCTGACTCGTTGGGTCTTCTACCCGGCGGGCGCTGGTCCGCAGT 

CGATGGTGGCCCACCGGCCGCAGGCCGATCTCCCGGAAGGAAGCGCCGTGTTGGGCGATG 

46120 + + + + + + 46179 

GCTACCACCGGGTGGCCGGCGTCCGGCTAGAGGGCCTTCCTTCGCGGCACAACCCGCTAC 

28 (ORF2 8) V L G D E - 

AGGACGGCAAGGCCGCCGAGCTGTGGTCGATGGCGAACCTGGGTACACCGATGGCCGTGC 

46180 + + + + + + 4 &239 

TCCTGCCGTTCCGGCGGCTCGACACCAGCTACCGCTTGGACCCATGTGGCTACCGGCACG 

28 DGKAAELWSMANLGTPMAVR- 

GCGTCGCGGCGACCCTGCGCATCGCCGACCACATCACGGCCGGAGCGCACACCGCCGGCG 

46240 + + + + + + 4 "99 

CGCAGCGCCGCTGGGACGCGTAGCGGCTGGTGTAGTGCCGGCCTCGCGTGTGGCGGCCGC 
28 VAATLRIADHI TAGAHTAGE- 

AAATCGCCGAAGCGGCCGCCGTGCACGAGGAATCCCTCGACCGGCTGCTGCGCTACCTCA 

46300 + + + + + + 46359 

TTTAGCGGCTTCGCCGGCGGCACGTGCTCCTTAGGGAGCTGGCCGACGACGCGATGGAGT 
28 IAEAAAVHEESLDRLLRYLT- 
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CCGTCCGGGGCCTGCTGGACCGTGACGGGCTCGGCCGGTACACGCTGACCCCCCTGGGCC 

46360 + + + + + + 46419 

GGCAGGCCCCGGACGACCTGGCACTGCCCGAGCCGGCCATGTGCGACTGGGGGGACCCGG 
28 VRGLLDRDGLGRYTLTPLGR- 

GGCCGCTGTGCGAGGACCACCCCGCCGGCGTCCGGGCCTGGTTCGACATGGAGGGAGCGG 

46420 + + + + + + 46479 

CCGGCGACACGCTCCTGGTGGGGCGGCCGCAGGCCCGGACCAAGCTGTACCTCCCTCGCC 
28 PLCEDHPAGVRAWFDMEGAG- 

GGCGGGGCGAGCTGTCGTTCGTCGACCTGCTGCACAGCGTACGGACCGGGAAGGCCGCCT 

46480 + + + + + + 46539 

CCGCCCCGCTCGACAGCAAGCAGCTGGACGACGTGTCGCATGCCTGGCCCTTCCGGCGGA 
28 RGELSFVDLLHSVRTGKAAF- 

TCCCCCTGCGCTACGGCCGCCCCTTCTGGGAGGACCTGGCGGAGGACCCCCGCCGCGCGG 

46540 + + + + + + 46599 

AGGGGGACGCGATGCCGGCGGGGAAGACCCTCCTGGACCGCCTCCTGGGGGCGGCGCGCC 
28 PLRYGRPFWEDLAED PRRAE- 

AGTCCTTCAACCGGCTGCTCGGCCAGGACGTCGCCACTCGCGCCCCGGCCGTGGTGGCCG 

46600 + + + + + + 46659 

TCAGGAAGTTGGCCGACGAGCCGGTCCTGCAGCGGTGAGCGCGGGGCCGGCACCACCGGC 
28 SFNRLLGQDVATRAPAVVAG- 

GCTTCGACTGGGCGAGCACCGGTCATGTCATCGACCTCGGAGGCGGCGACGGCTCCCTGC 

46660 + + + + + + 46719 

CGAAGCTGACCCGCTCGTGGCCAGTACAGTAGCTGGAGCCTCCGCCGCTGCCGAGGGACG 
28 FDWASTGHVIDLGGGDGSLL- 

TGACCGCACTGCTGACCGCCTGTCCGTCACTGCGCGGCACGGTCCTGGACCTGCCCGAAG 

46720 + + + + + + 4 6 77 9 

ACTGGCGTGACGACTGGCGGACAGGCAGTGACGCGCCGTGCCAGGACCTGGACGGGCTTC 
28 TALLTAC P SLRGTVLDL P EA- 

CGGTGCAGCGTGCCAAGGAGTCGTTCGCCGTGTCCGGACTGGACGACCGGGCGAACGCGG 

46780 + + + + + + 46839 

GCCACGTCGCACGGTTCCTCAGCAAGCGGCACAGGCCTGACCTGCTGGCCCGCTTGCGCC 
28 VQRAKES FAVSGLDDRANAV- 

TCGCGGGCAGCTTCTTCGACGCCCTCCCCGCCGGCGCGGGCGCCTACGTCCTGTCCCTGG 

46840 + + + + + + 46899 

AGCGCCCGTCGAAGAAGCTGCGGGAGGGGCGGCCGCGCCCGCGGATGCAGGACAGGGACC 
28 AGS FFDALPAGAGAYVLSLV- 

TCCTGCACGACTGGGACGACGAGGCGTCCGTCGCGATCCTGCGGCGCTGCGCCGAGGCGG 

46900 + + + + + + 46959 

AGGACGTGCTGACCCTGCTGCTCCGCAGGCAGCGCTAGGACGCCGCGACGCGGCTCCGCC 
28 LHDWDDEASVAI LRRCAEAA- 

CGGGGCAGACGGGATCGGTGTTCGTCATCGAGTCGACCGGCTCGGCGGGGGACGCCCCGC 

46960 + + + + + + 4 7019 

GCCCCGTCTGCCCTAGCCACAAGCAGTAGCTCAGCTGGCCGAGCCGCCCCCTGCGGGGCG 
28 GQTGSVFVI ESTGSAGDAPH- 

ACACAGGTATGGACCTGCGCATGCTGTGCATCTACGGAGCCAAGGAGCGCCGCGTGGAGG 

47020 + + + + + + 47079 

TGTGTCCATACCTGGACGCGTACGACACGTAGATGCCTCGGTTCCTCGCGGCGCACCTCC 
28 TGMDLRMLCIYGAKERRVEE- 

AGTTCGAGGAACTCGCCGGCCGGGCCGGGCTCCGGGTCGTCGCCGTCCACCCCGCGGGCC 

47080 + + + + + + 47139 

TCAAGCTCCTTGAGCGGCCGGCCCGGCCCGAGGCCCAGCAGCGGCAGGTGGGGCGCCCGG 
28 FEELAGRAGLRVVAVH PAGP- 
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CTTCCGCGATCATCCAGATGTCCGCGGTCTGACCGCCCGGAGCCCCGGCCCATCGCGGCG 

47140 + + + + + + 47199 

GAAGGCGCTAGTAGGTCTACAGGCGCCAGACTGGCGGGCCTCGGGGCCGGGTAGCGCCGC 

28 SAIIQMSAV* (ORF28) 

CGGGCCACGGCAGACAAGGAGAGAGCGTATGGCCGGCCTGGTCATGTCGCCGGTGGAGGC 
47200 + + + + + + 47259 

GCCCGGTGCCGTCTGTTCCTCTCTCGCATACCGGCCGGACCAGTACAGCGGCCACCTCCG 

(ORF29) MAGLVMS P V E A - 

GCTCGACGCGCTGGGCACGGTGCAGGGGCGTCAGGACCCCTATCCCTTCTACGAGGCGAT 

47260 + + + + + + 47319 

CGAGCTGCGCGACCCGTGCCACGTCCCCGCAGTCCTGGGGATAGGGAAGATGCTCCGCTA 
29 LDALGTVQGRQDPYPFYEAI 

CCGCGCGCACGGGCAGGCGGTCCCCACGAAGCCCGGCCGCTTCGTGGTGGTCGGCCACGA 
47320 + + + + + + 47379 

GGCGCGCGTGCCCGTCCGCCAGGGGTGCTTCGGGCCGGCGAAGCACCACCAGCCGGTGCT 

29 RAHGQAVPTKPGRFVVVGHD 

CGCGTGCGACCGGGCGCTGCGGGAACCGGCCCTGCGCGTCCAGGACGCCAGGAGCTACGA 

47380 + + + + + + 47439 

GCGCACGCTGGCCCGCGACGCCCTTGGCCGGGACGCGCAGGTCCTGCGGTCCTCGATGCT 
29 ACDRALREPALRVQDARSYD 

CGTCGTCTTCCCCTCGTGGCGGTCGCACTCCTCGGTCCGGGGGTTCACCAGCTCCATGCT 

47440 + + + + + + 47499 

GCAGCAGAAGGGGAGCACCGCCAGCGTGAGGAGCCAGGCCCCCAAGTGGTCGAGGTACGA 
29 VVFPSWRSHSSVRGFTSSML- 

CTACAGCAACCCGCCCGATCACGGCCGGTTGCGCCAGGTGGTGAGCTTCGCGTTCACCCC 

47500 + + + + + + 47559 

GATGTCGTTGGGCGGGCTAGTGCCGGCCAACGCGGTCCACCACTCGAAGCGCAAGTGGGG 
29 YSNPPDHGRLRQVVSFAFTP 

GCCCAAGGTGCGCCGGATGCACGGGGTGATCGAGGACATGACCGACCGGCTCCTCGACCG 
47560 + + + + + + 47619 

CGGGTTCCACGCGGCCTACGTGCCCCACTAGCTCCTGTACTGGCTGGCCGAGGAGCTGGC 
29 PKVRRMHGVIEDMTDRLLDR 

GATGGCCCGGCTCGGCTCCGGCGGCTCCCCGGTCGACCTCATAGCCGAGTTCGCCGCCCG 
47620 + + + + + + 47679 

CTACCGGGCCGAGCCGAGGCCGCCGAGGGGCCAGCTGGAGTATCGGCTCAAGCGGCGGGC 
29 MARLGSGGSPVDLIAEFAAR- 

GCTGCCCGTCGCGGTGATCAGCGAGATGATCGGCTTTCCGGCGAAGGACCAGGTGTGGTT 

47680 + , + + + + + 47739 

CGACGGGCAGCGCCACTAGTCGCTCTACTAGCCGAAAGGCCGCTTCCTGGTCCACACCAA 
29 LPVAVISEMIGFPAKDQVWF 

CCGCGACATGGCCTCCCGGGTCGCCGTGGCGACGGACGGTTTCACCGACCCCGGCGCGCT 

47740 + + + + + + 47799 

GGCGCTGTACCGGAGGGCCCAGCGGCACCGCTGCCTGCCAAAGTGGCTGGGGCCGCGCGA 
29 RDMASRVAVATDGFTDPGAL 

CACGGGGGCCGACGCCGCCATGGACGAGATGAGCGCCTACTTCGACGACCTCCTGGACCG 

47800 + + + + + + 47859 

GTGCCCCCGGCTGCGGCGGTACCTGCTCTACTCGCGGATGAAGCTGCTGGAGGACCTGGC 
29 TGADAAMDEMSAYFDDLLDR 

TCGCCGCCGCACCCCGGCCGACGACCTGGTCACCCTGCTCGCCGAGGCCCACGACGGCTC 
47860 + + + + + + 47919 

AGCGGCGGCGTGGGGCCGGCTGCTGGACCAGTGGGACGAGCGGCTCCGGGTGCTGCCGAG 
29 RRRTPADDLVTLLAEAHDGS 

CCCCGGGCGCCTGGACCACGACGAACTGATGGGCACCATGATGGTGCTGCTCACAGCCGG 
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47920 + + + + + + 47979 

GGGGCCCGCGGACCTGGTGCTGCTTGACTACCCGTGGTACTACCACGACGAGTGTCGGCC 
29 PGRLDHDELMGTMMVLLTAG- 

GTTCGAGACCACGAGCTTTCTGATCGGCCACGGGGCGATGATCGCCCTCGAACAACGGGC 

47980 + + + + + + 48039 

CAAGCTCTGGTGCTCGAAAGACTAGCCGGTGCCCCGCTACTAGCGGGAGCTTGTTGCCCG 
29 FETTSFLIGHGAMIALEQRA- 

GCACGCGGCCCGGCTGCGGGCCGAACCCGACTTCGCCGACGGCTACGTCGAGGAGATCCT 

48040 + + + + + + 48099 

CGTGCGCCGGGCCGACGCCCGGCTTGGGCTGAAGCGGCTGCCGATGCAGCTCCTCTAGGA 
29 HAARLRAEPDFADGYVEE I L 

CAGGTTCGAGCCGCCGGTCCACGTCACCAGCCGGTGGGCTGCCGAGGACCTCGACCTGCT 

48100 + + + + + + 48159 

GTCCAAGCTCGGCGGCCAGGTGCAGTGGTCGGCCACCCGACGGCTCCTGGAGCTGGACGA 
29 RFEPPVHVTSRWAAEDLDLL- 

GGGCCTGTCCGTACCGGCGGGCTCCAAGCTGGTCCTGATCCTGGCCGCCGCGAATCGCGA 

48160 + + + + + + 48219 

CCCGGACAGGCATGGCCGCCCGAGGTTCGACCAGGACTAGGACCGGCGGCGCTTAGCGCT 
29 GLSVPAGSKLVLI LAAANRD 

TCCCGGCCGCTACCCCGAGCCCGGCCGCTTCGACCCCGACCGCTACGCGCCCCGGCCGGG 

48220 + + + + + + 48279 

AGGGCCGGCGATGGGGCTCGGGCCGGCGAAGCTGGGGCTGGCGATGCGCGGGGCCGGCCC 
29 PGRYPEPGRFDPDRYAPRPG- 

CGGGCCGGAGGCCACCAGACCGCTGAGCTTCGGCGCGGGCGGCCACTTCTGCCTCGGCGC 

48280 + + + + + + 48339 

GCCCGGCCTCCGGTGGTCTGGCGACTCGAAGCCGCGCCCGCCGGTGAAGACGGAGCCGCG 
29 GPEATRPLSFGAGGHFCLGA- 

TCCGCTGGCGCGGCTGGAAGCCCGGATCGCGCTGCCGCGTCTGCTGCGCCGCTTCCCGGA 

48340 + + + + + + 48399 

AGGCGACCGCGCCGACCTTCGGGCCTAGCGCGACGGCGCAGACGACGCGGCGAAGGGCCT 
29 PLARLEARIALPRLLRRFPD- 

CCTGGCCGTGTCCGAGCCCCCCGTCTACCGCGACCGCTGGGTCGTCCGCGGCCTCGAAAC 

48400 + + + + + + 48459 

GGACCGGCACAGGCTCGGGGGGCAGATGGCGCTGGCGACCCAGCAGGCGCCGGAGCTTTG 
29 LAVSEPPVYRDRWVVRGLET- 

CTTTCCCGTGACCCTCGGGTCCTGAGCCCCCGCCGGCCGGAACACGTGACCGTCCCGGCC 

48460 + + + + + + 48519 

GAAAGGGCACTGGGAGCCCAGGACTCGGGGGCGGCCGGCCTTGTGCACTGGCAGGGCCGG 

29 FPVTLGS* (ORF29) 

GGCGGGTGCGCGCCCTCTCAGACGTACAGGGTGTTGGGCCCCTGACCACACAGCACCCGG 

48520 + + + + + + 48579 

CCGCCCACGCGCGGGAGAGTCTGCATGTCCCACAACCCGGGGACTGGTGTGTCGTGGGCC 

CCGTACAGCTCCAGGTTGGTGCTCGGGTTCATGCAGGTGCAGCGTGATGCTCTGGGCATC 

48580 + + + + + + 48639 

GGCATGTCGAGGTCCAACCACGAGCCCAAGTACGTCCACGTCGCACTACGAGACCCGTAG 
30 (ORF30)* APAAHHEPC 

GCTGCACGCGCTGGATCGGGACGTCGTTGTAGATCGAGGACCCGCCGCTCGCCTGGGCGA 

48640 + + + + + + 48699 

CGACGTGCGCGACCTAGCCCTGCAGCAACATCTAGCTCCTGGGCGGCGAGCGGACCCGCT 

30 RQVRQI PVDNYI S SGGSAQA 

GGATGTCCACCGACTCCTTGCCCAGTCGGCACGCCCGCCCCAGCAGGCCGCGGCACAGCA 

48700 + + + + + + 48759 

CCTACAGGTGGCTGAGGAACGGGTCAGCCGTGCGGGCGGGGTCGTCCGGCGCCGTGTCGT 
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30 LIDVSEKGLRCARGLLGRCL 

CCCGCTCCTCCAGCGTCCAGGCCTCGCCCGAAGCCCCCTTGGAGTCGACGAGGTCGGCCA 

48760 + + + + + + 48819 

GGGCGAGGAGGTCGCAGGTCCGGAGCGGGCTTCGGGGGAACCTCAGCTGCTCCAGCCGGT 
30 VREELTWAEGSAGKSDVLDA 

GCCGATGGGCGTGGAACCGTGCCTCGTCGGCCAGCAGGGTCGCCTCGCCGAGCTGCAGGT 

48820 + + + + + + 48879 

CGGCTACCCGCACCTTGGCACGGAGCAGCCGGTCGTCCCAGCGGAGCGGCTCGACGTCCA 
30 LRHAHFRAEDALLTAEGLQL 

GGGTGATCGGCGCCGAGCCCTGCTCCTCGTACTCGGTGTAGGTGATCTTGCGGCCGGGCA 

48880 + + + + + + 48939 

CCCACTAGCCGCGGCTCGGGACGAGGAGCATGAGCCACATCCACTAGAACGCCGGCCCGT 
30 HTI PASGQEEYETYTIKRGP 

GCCTCCCGCGGAAGACGTCCTGAGCGGCCGCGGCCAGTCCGGTCATGGTGCCGACCGACG 

48940 + + + + + + 48999 

CGGAGGGCGCCTTCTGCAGGACTCGCCGGCGCCGGTCAGGCCAGTACCACGGCTGGCTGC 
30 LRGRFVDQAAAALGTMTGVS 

AGGCCGAGGCCACGGCCAGCATCGGCGCCCGGAACATCGGTGATCCGGCGTTGAGTTCGG 

49000 + + + + + + 49059 

TCCGGCTCCGGTGCCGGTCGTAGCCGCGGGCCTTGTAGCCACTAGGCCGCAACTCAAGCC 
30 SASAVALMPARFMPSGANLE 

AGGCGTACTGCTGCTGGAGCACCGCGCCCAGCGGAAGGACGCGCTCCTGGGGAACGAAGA 

49060 + + + + + + 49119 

TCCGCATGACGACGACCTCGTGGCGCGGGTCGCCTTCCTGCGCGAGGACCCCTTGCTTCT 
30 SAYQQQLVAGLPLVREQPVF 

CGTCCGCGGCGATGGTGCTGACGCTTCCCGAGCCCCGGAGCCCCGAGGTGTGCCAGTCGT 

49120 + + + + + + 49179 

GCAGGCGCCGCTACCACGACTGCGAAGGGCTCGGGGCCTCGGGGCTCCACACGGTCAGCA 
30 VDAAITSVSGSGRLGSTHWD 

CGACGATCTGCAGCTGGTCGGTCGGCACCAGGGCCATCACGGGCTGCATGCCGCCGTCGG 

49180 + + + + + + 49239 

GCTGCTAGACGTCGACCAGCCAGCCGTGGTCCCGGTAGTGCCCGACGTACGGCGGCAGCC 
30 DVIQLQDTPVLAMVPQMGGD 

GGGTCGGTGAGACGGCGATCAGAACCTGCCAGTGACTGTGCCAGGCACCGCTGATGAAGC 

49240 + + + + + + 49299 

CCCAGCCACTCTGCCGCTAGTCTTGGACGGTCACTGACACGGTCCGTGGCGACTACTTCG 
30 PTPSVAILVQWHSHWAGSIF 

CCCACTTGCCGTTCACTACGACACCGCCGTCGACCGGGGCCGCCATGCCGCCGGGACTGA 

49300 + + + + + + 49359 

GGGTGAACGGCAAGTGATGCTGTGGCGGCAGCTGGCCCCGGCGGTACGGCGGCCCTGACT 
30 GWKGNVVVGGDVPAAMGGPS 

GGGTGCCGGAGACCCGGACATCCGGCCGGGAGAACACCTCGTCCTGCACGTGGTCGGGGA 

49360 + + + + + + 49419 

CCCACGGCCTCTGGGCCTGTAGGCCGGCCCTCTTGTGGAGCAGGACGTGCACCAGCCCCT 
30 LTGSVRVDPRSFVEDQVHDP 

AGAGGCCCGCCATCCAGGTGGGTATCCACCACACCGAGGCCGTCCAGGCGGCCGATCCGT 

49420 + + + + + + 49479 

TCTCCGGGCGGTAGGTCCACCCATAGGTGGTGTGGCTCCGGCAGGTCCGCCGGCTAGGCA 
30 FLGAMWTP IWWVSATWAASG 

CGCCGCGCGCCAGCTCGGCGGCCACGTCCACCAGGGTGCGGGCGTCGGACTCGAAGCCGC 

49480 + + + + + + 49539 

GCGGCGCGCGGTCGAGCCGCCGGTGCAGGTGGTCCCACGCCCGCAGCCTGAGCTTCGGCG 
30 DGRALEAAVDVLTRADSEFG 
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CGTAACGGGCCGGCACGCGCATGCGGAAGATCCCGGCTTCGGCCATCGCCTCGACCGACT 

49540 + + + + + + 49599 

GCATTGCCCGGCCGTGCGCGTACGCCTTCTAGGGCCGAAGCCGGTAGCGGAGCTGGCTGA 
30 GYRAPVRMRFI GAEAMAEVS 

CCTCGTGCAGCCGCCGGTTCTCCTCGGTCCAGGCCGCGTGGGACTGGAGCAGCGGCCTCA 

49600 + + + + + + 49659 

GGAGCACGTCGGCGGCCAAGAGGAGCCAGGTCCGGCGCACCCTGACCTCGTCGCCGGAGT 
30 EEHLRRNEETWAAHSQLLPR 

GCTTCGAGGCCCGTTCCACCAGTTCGGTACGGGCGGGCGTAGACGTCTGGTCCACTCGAT 

49660 + + + + + + 49719 

CGAAGCTCCGGGCAAGGTGGTCAAGCCATGCCCGCCCGCATCTGCAGACCAGGTGAGCTA 

30 LKSAREVLETRAPTSTQDV 
(ORF3 0) 

CCTCCAGGAATCATGAGACGCCCTGTCCGCGGTATGCGGAAGCAGGCGTCTGCGCGCATC 

49720 + + + + + + 49779 

GGAGGTCCTTAGTACTCTGCGGGACAGGCGCCATACGCCTTCGTCCGCAGACGCGCGTAG 

GGTCAGGACGGCGTCGCCCTGCTCCCGCATGGTTCACCGAGTTCCGCGGACGTCGCATCT 

49780 + + + + + + 49839 

CCAGTCCTGCCGCAGCGGGACGAGGGCGTACCAAGTGGCTCAAGGCGCCTGCAGCGTAGA 

CCTTGATTGCCGGTCACCTACCCCGATGCCGATCGGGCTGGTGCGACAGCGCATCCCACG 

49840 + + + + + + 49899 

GGAACTAACGGCCAGTGGATGGGGCTACGGCTAGCCCGACCACGCTGTCGCGTAGGGTGC 

AGAAGTCCACGAACGGTCCGGGAAGCCAGAATGTGCTTCTCGGCCGGAGTCACGGCCGGC 

49900 + + + + + + 49959 

TCTTCAGGTGCTTGCCAGGCCCTTCGGTCTTACACGAAGAGCCGGCCTCAGTGCCGGCCG 

GCCGGCGCCCGTCGCCGGTCACGCCGGACCACGCCCGGACCGGTCATGGAGGCAGCCCAT 

49960 + + + + + + 50019 

CGGCCGCGGGCAGCGGCCAGTGCGGCCTGGTGCGGGCCTGGCCAGTACCTCCGTCGGGTA 

GAGTGACAACGACAGTCCGTCCCGGGTGCCGGCCGCGGTGGCACCCGCCACCGCGAAACC 

50020 + + + + + + 50079 

CTCACTGTTGCTGTCAGGCAGGGCCCACGGCCGGCGCCACCGTGGGCGGTGGCGCTTTGG 

GTCGGCCGGCACGGTCCTCGGCGCCGCGGTGGCTTCGCCCGCCGCCTACACCGCGGCGAC 

50080 + + + + + + 50139 

CAGCCGGCCGTGCCAGGAGCCGCGGCGCCACCGAAGCGGGCGGCGGATGTGGCGCCGCTG 

CGCCCAGGAAGCGGCGACCGCGCTGGTCCGCATGCTGATGGAACAGATGGTGCTCGGTCC 

50140 + + + + + + 50199 

GCGGGTCCTTCGCCGCTGGCGCGACCAGGCGTACGACTACCTTGTCTACCACGAGCCAGG 

CGGCGCGGTCGGTCCCGAGACCCGCGCGGACGGCCCGGCGGGGCGGACCGGCTCCGGCCA 

50200 + + + + + + 50259 

GCCGCGCCAGCCAGGGCTCTGGGCGCGCCTGCCGGGCCGCGCCGCCTGGCCGAGGCCGGT 

CGGCCCGGCGCCGCAGACCGGACCGGACGCGCCGGGCGAACCCCCGCCCACGTGGGCGCC 

50260 + + + + + + 50319 

GCCGGGCCGCGGCGTCTGGCCTGGCCTGCGCGGCCCGCTTGGGGGCGGGTGCACCCGCGG 

GAACCTCGACGACGGGAAGGTAGGAGGACGATGAGGCCGCTCGTTCGGGCAGTGCTGCGG 

50320 + + + + + + 50379 

CTTGGAGCTGCTGCCCTTCCATCCTCCTGCTACTCCGGCGAGCAAGCCCGTCACGACGCC 
31 (ORF31) MRPLVRAVLR- 

GGTTCCCTGCGGCAGGTGAGGTACGTGGACGTGGTCTCCCCGCGCCGGGCGCGCTCCCTG 

50380 + + + + + + 50439 

CCAAGGGACGCCGTCCACTCCATGCACCTGCACCAGAGGGGCGCGGCCCGCGCGAGGGAC 

31 GSLRQVRYVDVVSPRRARSL 
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GTGGCGCGGGTGTACCGGGAGACCGAGGAGCAGTTCGGCGTGCTCGCGCCCCCCCTGGCC 

50440 + + + + + + 50499 

CACCGCGCCCACATGGCCCTCTGGCTCCTCGTCAAGCCGCACGAGCGCGGGGGGGACCGG 
31 VARVYRETEEQFGVLAPPLA 

CTCCACTCGCCCGCCGCGGCGTCGCTGGCCGCGACGTGGCTCATGCTGCGGGAGACACTG 

50500 + + + + + + 50559 

GAGGTGAGCGGGCGGCGCCGCAGCGACCGGCGCTGCACCGAGTACGACGCCCTCTGTGAC 
31 LHSPAAASLAATWLMLRETL 

CTGGTCGACGGGCGGGTGAGCCGGGCGGTGAAGGAGACGGTCGCCACCGAGGTCTCCCGT 

50560 + + + + + + 50619 

GACCAGCTGCCCGCCCACTCGGCCCGCCACTTCCTCTGCCAGCGGTGGCTCCAGAGGGCA 
31 LVDGRVSRAVKETVATEVSR 

GCCAACGACTGTCCGTACTGCGTCCAGGTCCATCAGGCGGTACTCGGGACACTGCCTCCG 

50620 + + + + + + 50679 

CGGTTGCTGACAGGCATGACGCAGGTCCAGGTAGTCCGCCATGAGCCCTGTGACGGAGGC 
31 ANDCPYCVQVHQAVLGTLPP 

GACGGCGGCCAGGCCGGGCTCCTGCGGTGGGTCCGGGAGGCAGGCCGACGGCCCGGCGGC 

50680 + + + + + + 50739 

CTGCCGCCGGTCCGGCCCGAGGACGCCACCCAGGCCCTCCGTCCGGCTGCCGGGCCGCCG 
31 DGGQAGLLRWVREAGRRPGG 

GGTGCGGTGGGCGGCGGGCGGCCGCTTCCGTTCAGCGGTGAACAGGCACCGGAACTGTGC 

50740 + + + + + + 50799 

CCACGCCACCCGCCGCCCGCCGGCGAAGGCAAGTCGCCACTTGTCCGTGGCCTTGACACG 
31 GAVGGGRPLPFSGEQAPELC 

GGCGTCGTGGTCACGTTCCACTACATCAACCGCATGGTCTCCCTCTTCCTCGACGACTCC 

50800 + + + + + + 50859 

CCGCAGCACCAGTGCAAGGTGATGTAGTTGGCGTACCAGAGGGAGAAGGAGCTGCTGAGG 
31 GVVVTFHYINRMVSLFLDDS 

CCCATGCCGACCCGGACGCCGACACCGTTGCGCGGGCCCATCATGAGGACCACCGCACTG 

50860 + + + + + + 50919 

GGGTACGGCTGGGCCTGCGGCTGTGGCAACGCGCCCGGGTAGTACTCCTGGTGGCGTGAC 
31 PMPTRTPTPLRGPIMRTTAL 

GCCATGCGTCCCGTCGGCCCGGGGCTGCTGACACCGGGCGCATCGCTCGGCCTGCTGCCT 

50920 + + + + + + 50979 

CGGTACGCAGGGCAGCCGGGCCCCGACGACTGTGGCCCGCGTAGCGAGCCGGACGACGGA 
31 AMRPVGPGLLTPGASLGLLP 

CCGGCTCCCCTGCCGCCCGGACTGGAGTGGGCCGAGGGCAACCCTTTCGTGGCCCAGGCG 

50980 + + + + + + 51039 

GGCCGAGGGGACGGCGGGCCTGACCTCACCCGGCTCCCGTTGGGAAAGCACCGGGTCCGC 
31 PAPLPPGLEWAEGNPFVAQA 

CTGGGGCGTGCCGTCGCCGCTGTGGACCAGGGAGCGCACTGGGTGCCCGAACCGGTCCGG 

51040 + + + + + + 51099 

GACCCCGCACGGCAGCGGCGACACCTGGTCCCTCGCGTGACCCACGGGCTTGGCCAGGCC 
31 LGRAVAAVDQGAHWVPEPVR 

GAGCGGCTGCGCACACGTCTGGACACCTGGGACGGATCGGCGCCGGGCCTCGGCCGGGGA 

51100 + + + + + + 51159 

CTCGCCGACGCGTGTGCAGACCTGTGGACCCTGCCTAGCCGCGGCCCGGAGCCGGCCCCT 
31 ERLRTRLDTWDGSAPGLGRG 

TGGCTCGACGAGGCCGTGTCCGGCCTGCCGCCCCAGGACGTGCCCGCGGCACGGCTGGCG 

51160 + + + + + + 51219 

ACCGAGCTGCTCCGGCACAGGCCGGACGGCGGGGTCCTGCACGGGCGCCGTGCCGACCGC 
31 WLDEAVSGLPPQDVPAARLA 

CTGCTGACGGCCTTCGCCCCCTACCAGGTGCTCCCGGACGACGTCGAGGAGTTCAGACGG 
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51220 + + + + + + 51279 

GACGACTGCCGGAAGCGGGGGATGGTCCACGAGGGCCTGCTGCAGCTCCTCAAGTCTGCC 
31 LLTAFAPYQVLPDDVEEFRR 

CGTCGGCCCACCGACCGCGAACTCGTCGAGCTCACGTCCTACGCCGCGCTGACCACGGCC 

51280 + + + + + + 51339 

GCAGCCGGGTGGCTGGCGCTTGAGCAGCTCGAGTGCAGGATGCGGCGCGACTGGTGCCGG 
31 RRPTDRELVELTSYAALTTA 

GTCCGTGTCGGTCGCACGCTCGTCGTGCCCGACGCCGCCGGGCCGGGATGAACGGCCCCG 

51340 + + + + + + 51399 

CAGGCACAGCCAGCGTGCGAGCAGCACGGGCTGCGGCGGCCCGGCCCTACTTGCCGGGGC 

31 VRVGRTLVVPDAAGPG* (ORF31) 

CAACGGCTCGGGAAGGCTGTCTCACGGCCGGAGGCGTACGCCGGTGAGGTGCTCGGACTC 

51400 + + + + + + 51459 

GTTGCCGAGCCCTTCCGACAGAGTGCCGGCCTCCGCATGCGGCCACTCCACGAGCCTGAG 

(ORF32) * PRLRVGTLHE S E - 

CTCCCAGAGGCGGCGCCGGGCCCTGGGGTCGACGGCTGCTCCGCCGGGGCGCACGAGCCC 

51460 + + + + + + 51519 

GAGGGTCTCCGCCGCGGCCCGGGACCCCAGCTGCCGACGAGGCGGCCCCGCGTGCTCGGG 

32 EWLRRRARPDVAAGGPRVLG- 

GGGTGCGCCCCGGGTCTCGGTCACGCCGAGGGGCCCGTAGAACTCGCCCCCGCGCGCGCC 

51520 + + + + + + 51579 

CCCACGCGGGGCCCAGAGCCAGTGCGGCTCCCCGGGCATCTTGAGCGGGGGCGCGCGCGG 
32 PAGRTETVGLPGYFEGGRAG- 

GGGATCGGTGGCCGCCCGCAGACCAGGCAGCATCCCCGCCGCGGCGGGCTGCAGGAACAA 

51580 + + + + + + 51639 

CCCTAGCCACCGGCGGGCGTCTGGTCCGTCGTAGGGGCGGCGCCGCCCGACGTCCTTGTT 
32 PDTAARLGPLMGAAAPQLFL- 

CGGGGCGAGCGGGGAGCCGAGCCTGCGCACGGGCGCGGGAAAGTCCCGGCCCAGACCGGT 

51640 + + + + + + 51699 

GCCCCGCTCGCCCCTCGGCTCGGACGCGTGCCCGCGCCCTTTCAGGGCCGGGTCTGGCCA 
32 PAL PSGLRRVPAPFDRGLGT- 

CGCGGTCAGCCCGGGATGAGCGGCGAGCGAGGCCAGTTCCGCGCCGGACTCCGCCAGTCT 

51700 + + + + + + 51759 

GCGCCAGTCGGGCCCTACTCGCCGCTCGCTCCGGTCAAGGCGCGGCCTGAGGCGGTCAGA 
32 ATLGPHAALSALEAGS EALR- 

GTGATGGAGTTCCAGCGCGAACATGAGGTTGGCCAGCTTGGACTGGTTGTAGGCCCGGTA 

51760 + + + + + + 51819 

CACTACCTCAAGGTCGCGCTTGTACTCCAACCGGTCGAACCTGACCAACATCCGGGCCAT 
32 HHLELAFMLNALKS QNYARY- 

CCGGCTGTAGCGGCGTTCGCCGTGAAGGTCGCTGAAGTCGATGCGCCCCAGCCGGTGCAG 

51820 + + + + + + 51879 

GGCCGACATCGCCGCAAGCGGCACTTCCAGCGACTTCAGCTACGCGGGGTCGGCCACGTC 
32 RSYRREGHLDS FD I RGLRHL- 

ATAGCTGCTGATCGTCACGACCCGCGCGCCCGGCGCGGCCCGCAGGCTGTCCAGGAGCAG 

51880 + + + + + + 51939 

TATCGACGACTAGCAGTGCTGGGCGCGCGGGCCGCGCCGGGCGTCCGACAGGTCCTCGTC 
32 YSSITVVRAGPAARLSDLLL- 

GCCGGTGAGGGCGAAGTGCCCCAGGTGGTTCGTGGCGAACTGGAGTTCGTGACCGTCCGG 

51940 + + + + + + 51999 

CGGCCACTCCCGCTTCACGGGGTCCACCAAGCACCGCTTGACCTCAAGCACTGGCAGGCC 
32 GTLAFHGLHNTAFQLEHGDP- 

GGTGCGGGCCCGGTCGGTCCACATCACGCCCGCGTTGTTGACCAGCAGGTGGATGCGCGG 
52000 + + + + + + 52059 
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CCACGCCCGGGCCAGCCAGGTGTAGTGCGGGCGCAACAACTGGTCGTCCACCTACGCGCC 

TRARDTWMVGANNVLLH I R P - 

GAAGCGGTCGCGCAGTTCCTCGGCGCCGGCACGCACCGACGCGAGACGGGAAAGATCCAG 

52060 + + + + + + 52119 

CTTCGCCAGCGCGTCAAGGAGCCGCGGCCGTGCGTGGCTGCGCTCTGCCCTTTCTAGGTC 

FRDRLEEAGARVSALRSLDL- 

CCGTCTGACCGTCAGTTGCGCCGACGGCACCCGGCTTTGGATGCGGGCCGCCGCGGCGAC 

52120 + + + + + + 52179 

GGCAGACTGGCAGTCAACGCGGCTGCCGTGGGCCGAAACCTACGCCCGGCGGCGCCGCTG 

RRVTLQAS PVRS Q I R A A A A V - 

CCCGCGGTCCGGATCGCGCACGGCCAGCACCACGTGGGCGCCGTGCCGGGCGAGCTCCTG 

52180 + + + + + + 52239 

GGGCGCCAGGCCTAGCGCGTGCCGGTCGTGGTGCACCCGCGGCACGGCCCGCTCGAGGAC 

GRD PDRVALVVHAGHRALE Q - 

CGCCAGGTGCAGTCCGATGCCGGAGCTGGCACCGGTGACCACCGCGGTGGTTCCGGTACG 

52240 + + + + + + 52299 

GCGGTCCACGTCAGGCTACGGCCTCGACCGTGGCCACTGGTGGCGCCACCAAGGCCATGC 

ALHLG I GS SAGTVVATTGTR- 

GTCCGGGACATCGGCGGCGCTCCAGCGTCGCCGCGTTCTCATCGGTCGTCCCTCCCGGGG 

52300 4- + + + + + 52359 

CAGGCCCTGTAGCCGCCGCGAGGTCGCAGCGGCGCAAGAGTAGCCAGCAGGGAGGGCCCC 
DPVDAASWRRRTRM (ORF32) 



GATGCGTCAGCCGGCCTGGGCCATCGCGGCCCGGTAGCCGTTGGCGACGATCTGCCGGGC 

+ + + + + h 

CTACGCAGTCGGCCGGACCCGGTAGCGCCGGGCCATCGGCAACCGCTGCTAGACGGCCCG 



GGAGTGCTCGTAGTACTCGTCGTCCTTCGGCAGCTCCGTGGCGAGACCGCTGACGTACCG 

52420 + + + + + + 52479 

CCTCACGAGCATCATGAGCAGCAGGAAGCCGTCGAGGCACCGCTCTGGCGACTGCATGGC 

GTTGAACATGCAGAACGCGGCGGCGATCAGAACGGTGTCGTGCAGAGCGGTGTCGTCCGC 

52480 + + + + + + 52539 

CAACTTGTACGTCTTGCGCCGCCGCTAGTCTTGCCACAGCACGTCTCGCCACAGCAGGCG 

TCCCTCGGCCCGCGCCGAGGCGATCACCCCTGCGGAGACCGGGCGCGCCGCGCTCTGGAC 

52540 + + + + + + 52599 

AGGGAGCCGGGCGCGGCTCCGCTAGTGGGGACGCCTCTGGCCCGCGCGGCGCGAGACCTG 



CTCGGCGGCGACGGCCAGCAGCGCGCGCGTCCTGCCGTCGATGGGCGCGGTGGCGGGGTC 

GAGCCGCCGCTGCCGGTCGTCGCGCGCGCAGGACGGCAGCTACCCGCGCCACCGCCCCAG 

GGCGAGGACGGCCTCGACGAGCTGCCGGCCTCCCGGCAGCTGCGCGGCGGCGAAGGCCCC 

+ 1. h + + + 

CCGCTCCTGCCGGAGCTGCTCGACGGCCGGAGGGCCGTCGACGCGCCGCCGCTTCCGGGG 



GTGGGAGGCGGCGCAGAACTCGGTGGAGTTGAGATGCGAGACGTACGCCGCGATGAGCTC 

52720 + + + + + + 52779 

CACCCTCCGCCGCGTCTTGAGCCACCTCAACTCTACGCTCTGCATGCGGCGCTACTCGAG 

GCGTTGCCCCGGTTCCAGCGAGGACGGCGCCCGCAGCAGGGCGTTCGCGAGATCGCCCAG 

52780 + + + + + + 52839 

CGCAACGGGGCCAAGGTCGCTCCTGCCGCGGGCGTCGTCCCGCAAGCGCTCTAGCGGGTC 



CGGTGCTGCGGTGCCGGGGTGGTGAGCCATCAGACCACTGATGCCGGGGAGGTCGTTGTC 

52840 + + + + + + 52899 

GCCACGACGCCACGGCCCCACCACTCGGTAGTCTGGTGACTACGGCCCCTCCAGCAACAG 

GAGTGCTATGTGGGGCACGGCTCTTCCTTCCGGGTGGACGAGGGGCGGACGGCGGCGGAT 
52900 + + + + + + 52959 
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CTCACGATACACCCCGTGCCGAGAAGGAAGGCCCACCTGCTCCCCGCCTGCCGCCGCCTA 

CAGGGCCATTCGACTTCGTCGTCGGCGGCCGCGCAGATGCGGGTGAAGGGCCATTCCACG 

5 29 6 o + + + + + + 53019 

GTCCCGGTAAGCTGAAGCAGCAGCCGCCGGCGCGTCTACGCCCACTTCCCGGTAAGGTGC 

TCTTCCCCTCCCGTTGCGGAGTGGGCGGAGGCCGTGGTGAAGAGGGTGACGAGTCCGAAC 

53020 + + + + + + 53079 

AGAAGGGGAGGGCAACGCCTCACCCGCCTCCGGCACCACTTCTCCCACTGCTCAGGCTTG 

GTGCCGAAGAGGAGGGACAGTCGGGCAACGTGAAGTGCGGTACCCATGCGAGCTCCTAGC 

53080 + + + + + + 53139 

CACGGCTTCTCCTCCCTGTCAGCCCGTTGCACTTCACGCCATGGGTACGCTCGAGGATCG 

GAGGGCGGCGTGACCGCGGGACGGTGAGACCTCGTGATGCCAGGAAGCTAGCGAATCGGA 

53140 + + + + + + 53 1" 

CTCCCGCCGCACTGGCGCCCTGCCACTCTGGAGCACTACGGTCCTTCGATCGCTTAGCCT 

CTGAGGGTGGCAACGATATGCCAGACTTTGGCAACTTGCCTGTGTATCAGCCGGACTGTC 

53200 + + + + + + 53259 

GACTCCCACCGTTGCTATACGGTCTGAAACCGTTGAACGGACACATAGTCGGCCTGACAG 

(ORF33) V Y Q P D C R 

GGCCGCTGGTAAAGACGGAACGGCGAGATCCCGCGACCGCGTCGCAGAGCAGCAGGGTCT 

53260 + + + + + + 53319 

CCGGCGACCATTTCTGCCTTGCCGCTCTAGGGCGCTGGCGCAGCGTCTCGTCGTCCCAGA 

PLVKTERRDPATASQSSRVC- 

GCTCACCCAGCGTCGGGGCGGCCAGCATGTCGCGTACCGGGAGCGTGACGCCCAGCTCGC 

53320 + + + + + + 53379 

CGAGTGGGTCGCAGCCCCGCCGGTCGTACAGCGCATGGCCCTCGCACTGCGGGTCGAGCG 

SPSVGAASMSRTGSVTPSSR- 

GGTTGATCCTGCGGACCAGCCGGGTGATGAGCAGGGAGTCGCCGCCGTGGGCGAAGAAAT 

53380 + + + + + + 53439 

■CCAACTAGGACGCCTGGTCGGCCCACTACTCGTCCCTCAGCGGCGGCACCCGCTTCTTTA 

LI LRTSRVMS RES P PWAKKS- 

CAGCACCTTCGGAGGGGTCCGGGAAGCCGAGCAGGTCACCCCAGCCGCGCACCAGTACCT 

53440 + + + + + + 53499 

GTCGTGGAAGCCTCCCCAGGCCCTTCGGCTCGTCCAGTGGGGTCGGCGCGTGGTCATGGA 

APSEGSGKPSRSPQPRTSTW- 

GGCGGATGTCGCCGGTGGTGACGACCGTGCGCCGGGAGCCCCGACGTGCCGAGCGCAGCC 

53500 + + + + + + ^-5^=>^ 

CCGCCTACAGCGGCCACCACTGCTGGCACGCGGCCCTCGGGGCTGCACGGCTCGCGTCGG 

RMS PVVTTVRRE PRRAERS R - 

GCGAGGCATGCACCAGCGCCACCTGGTCGCCGAGGTTGCGCCGCGACAGCTCGCGCAGCG 

53560 + + + + + + 53619 

CGCTCCGTACGTGGTCGCGGTGGACCAGCGGCTCCAACGCGGCGCTGTCGAGCGCGTCGC 

EACTSATWS PRLRRD S S RSD- 

ACACCGTGACGCCGAACCTCTCGGTGATCCTGCGGACCAGCCGCGTGATCAGCAGCGTGT 

53620 + + + + + + 53679 

TGTGGCACTGCGGCTTGGAGAGCCACTAGGACGCCTGGTCGGCGCACTAGTCGTCGCACA 

TVTPNLSVI LRTSRVI S SVS- 

CCCCGCCGCGCGCGAAGAAATCCGAATGCTCGGTGAGGTCGGAGCGGCCGAGGAGCTCGC 

53680 + + + + + + 53739 

GGGGCGGCGCGCGCTTCTTTAGGCTTACGAGCCACTCCAGCCTCGCCGGCTCCTCGAGCG 

PPRAKKSECSVRSERPRSSL- 

TCCACGCGCCGACCATGAACTCCCCCACGTCACCGAGCCGGTGCTCGTCGCCGTCGGGGC 

53740 + + + + + + 53799 

AGGTGCGCGGCTGGTACTTGAGGGGGTGCAGTGGCTCGGCCACGAGCAGCGGCAGCCCCG 
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CCTTCGGCGCGCCGGATCCCGCGGAACGGTTCCGGCCGGAGACGGCAGAGCGGTCACTGG 

53800 + + + + + + 53859 

GGAAGCCGCGCGGCCTAGGGCGCCTTGCCAAGGCCGGCCTCTGCCGTCTCGCCAGTGACC 
FGAPDPAERFRPETAERSLV- 

TCACTTTCGCCACCTCCAGGGGCATGTGTCGGCTGCATCGGCTTCCCGCCACGGTACGGG 

53860 + + + + + + 53919 

AGTGAAAGCGGTGGAGGTCCCCGTACACAGCCGACGTAGCCGAAGGGCGGTGCCATGCCC 

TFATSRGMCRLHRL PATVRE- 

AGCACATGTTGCATGGCAATACCTTTCCAAGTCGGTGGCAACCCTCCTTGCCATCCACCC 

53920 + + + + + + 53979 

TCGTGTACAACGTACCGTTATGGAAAGGTTCAGCCACCGTTGGGAGGAACGGTAGGTGGG 

HMLHGNTFPSRWQPSLPSTH- 

ACTGCAGTTGGGCGAGATGTGTAGGCATTCGAGGTCCGCAGGTTTGCCAAGCCGCGCGCG 

53980 + + + + + + 54039 

TGACGTCAACCCGCTCTACACATCCGTAAGCTCCAGGCGTCCAAACGGTTCGGCGCGCGC 

C SWARCVGI RGPQVCQAARD- 

ACCGGCATACTCTCTGGCACAACTGGAATGAGTAGCGTGGCAGGCCACGGGGACCGGGCC 

54040 + + + + + + 54099 

TGGCCGTATGAGAGACCGTGTTGACCTTACTCATCGCACCGTCCGGTGCCCCTGGCCCGG 

RHTLWHNWNE* ( ORF3 3 ) 

GGGCCAGGAACCTTCGTCCTCCATCTATTCGCTGGGGCGTGCACGTGTTGGAGCAGCCAT 

54100 + + + + + + 54159 

CCCGGTCCTTGGAAGCAGGAGGTAGATAAGCGACCCCGCACGTGCACAACCTCGTCGGTA 

CTTTCGGCCGTCGCCTGAGGCAGCTGAGGACCGAGCGGGGTCTTTCCCAGGCCGCGCTCG 

54160 + + + + + + 54219 

GAAAGCCGGCAGCGGACTCCGTCGACTCCTGGCTCGCCCCAGAAAGGGTCCGGCGCGAGC 

CGGGGGACGGCATGTCTACGGGCTATCTCTCGCGCCTGGAGTCGGGCGCCCGGCAGCCCT 

54220 + + + + + + 54279 

GCCCCCTGCCGTACAGATGCCCGATAGAGAGCGCGGACCTCAGCCCGCGGGCCGTCGGGA 

(ORF34) MSTGYLSRLESGARQPS- 

CCGATCGCGCCGTCGCCCACCTGGCCGGACAACTCGGCATCAGCCCGTCGGAGTTCGAAG 

54280 + + + + + + 54339 

GGCTAGCGCGGCAGCGGGTGGACCGGCCTGTTGAGCCGTAGTCGGGCAGCCTCAAGCTTC 

DRAVAHLAGQLGI SPSEFEG- 

GGTCCCGGGCCACCTCGCTCGCCCAGATCCTCTCCCTCTCCACTTCCCTGGAGTCCGACG 

54340 + + + + + + 54399 

CCAGGGCCCGGTGGAGCGAGCGGGTCTAGGAGAGGGAGAGGTGAAGGGACCTCAGGCTGC 

SRATSLAQILSLSTSLESDE- 

AGACCAGTGAGCTTCTCGCCGAGGCGGTACGTTCCGCGCATGGCCAGGATCCGATGCTCC 

54400 + + + + + + 54459 

TCTGGTCACTCGAAGAGCGGCTCCGCCATGCAAGGCGCGTACCGGTCCTAGGCTACGAGG 

TS ELLAEAVRSAHGQ D PMLR- 

GCTGGCAGGCCCTGTGGCTGCTGGGACAGTGGAAGCGCCGGCACGGCGACTCGGCCGGCG 

54460 + + + + + + 54519 

CGACCGTCCGGGACACCGACGACCCTGTCACCTTCGCGGCCGTGCCGCTGAGCCGGCCGC 

WQALWLLGQWKRRHGD SAGE- 

AGCACGGCTACCTCCAGCGTCTGGTGACGCTGAGTGAGGAGATCGGCCTGGCCGAGTTGC 

54520 + + + + + + 54579 

TCGTGCCGATGGAGGTCGCAGACCACTGCGACTCACTCCTCTAGCCGGACCGGCTCAACG 

HGYLQRLVTLSEE I GLAELR- 

GCGCACGGGCCCTGACCCAGTTCGCCCGGTCGCTGCGGGTACTGGGCGAGATCGTTCCGG 
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54580 + + + + + + 54639 

CGCGTGCCCGGGACTGGGTCAAGCGGGCCAGCGACGCCCATGACCCGCTCTAGCAAGGCC 
34 ARALTQFARS LRVLGE I V P A - 

CGGTGGAGGCTGCCGCCGCCGCCCACCGGCTCGCGGTGGACCATGCGCTGTCCAGCCAGG 

54640 + + + + + + 54699 

GCCACCTCCGACGGCGGCGGCGGGTGGCCGAGCGCCACCTGGTACGCGACAGGTCGGTCC 
34 VEAAAAAHRLAVDHALSSQD- 

ACAGGGCCGCTTCGCTGCTGGTTCTGGTGTCGGTGGAGGCCGAGGCGGGACGGATGCCCG 

54700 + + + + + + 54759 

TGTCCCGGCGAAGCGACGACCAAGACCACAGCCACCTCCGGCTCCGCCCTGCCTACGGGC 
34 RAASLLVLVSVEAEAGRMPD- 

ACGCCCGGCGCCACGCCGACGAACTGACCGTCCTGGTGAGGGGACGGTCCGACACTCTGT 

54760 + + + + + + 54819 

TGCGGGCCGCGGTGCGGCTGCTTGACTGGCAGGACCACTCCCCTGCCAGGCTGTGAGACA 
34 ARRHADELTVLVRGRSDTLW- 

GGGCCGAGGCGTTGTGGACGGCGGGTGCGTTGAAGGTGCGGCAGGGCGAGTTCGCCGCGG 

54820 + + + + + + 54879 

CCCGGCTCCGCAACACCTGCCGCCCACGCAACTTCCACGCCGTCCCGCTCAAGCGGCGCC 
34 AEALWTAGALKVRQGE F A A A - 

CCGAGGTCCTTTTCCAGGAGGCTCTGGACGGGTTCGACAGCCGGGAGAACCTGACGATCT 

54880 + + + + + + 54939 

GGCTCCAGGAAAAGGTCCTCCGAGACCTGCCCAAGCTGTCGGCCCTCTTGGACTGCTAGA 
34 EVLFQEALDGFDSRENLTIW- 

GGCTGCGGCTGCGCATCGCGATGGCCGAACTCCACCTGCAGAAACTTCCTCCCGAGCCCG 

54940 + + + + + + 54999 

CCGACGCCGACGCGTAGCGCTACCGGCTTGAGGTGGACGTCTTTGAAGGAGGGCTCGGGC 
34 LRLRIAMAELHLQKLPPEPD- 

ACGCCGCGCAGCTCTGCATCGAGGCGGCGGAGGCGGCCCTTCCCTTTGCCCGCACATCCG 

55000 + + + + + + 55059 

TGCGGCGCGTCGAGACGTAGCTCCGCCGCCTCCGCCGGGAAGGGAAACGGGCGTGTAGGC 
34 AAQLCIEAAEAALPFARTSA- 

CTCTGGAACAGTCCCTCGCCGCTCTGCGGGCGCGCCTCGCCTTCCATGAGGGCAGGTTCG 

55060 + + + + + + 55119 

GAGACCTTGTCAGGGAGCGGCGAGACGCCCGCGCGGAGCGGAAGGTACTCCCGTCCAAGC 
34 LEQS LAALRARLAFHEGRFA- 

CCGATGCCCGCGCGTTGTTGGAGAGGCTCGGCAGGACCGAGCTCCGGCTGCCCTATCAGA 

55120 + + + + + + 55179 

GGCTACGGGCGCGCAACAACCTCTCCGAGCCGTCCTGGCTCGAGGCCGACGGGATAGTCT 
34 DARALLERLGRTELRL P Y Q S - 

GCCGGATCCGCCTGGAGGTCCTCGGTCATCAGCTGCGCATCCTGAGCGGGGAGGAGGAGG 

55180 + + + + + + 55239 

CGGCCTAGGCGGACCTCCAGGAGCCAGTAGTCGACGCGTAGGACTCGCCCCTCCTCCTCC 
34 RIRLEVLGHQLRILSGEEEE- 

AAGGCCTGGCCGGCCTCCAGCTCCTGGCCGAGGAGGCGCAGGAGAACTCCAACATCAACC 

55240 + + + + + + 55299 

TTCCGGACCGGCCGGAGGTCGAGGACCGGCTCCTCCGCGTCCTCTTGAGGTTGTAGTTGG 
34 GLAGLQLLAEEAQENSNINL- 

TCGCCGCGGAGATCTGGCGGCTCGCGGCGGAATGCCTGATGCGGGCGCGCGGGAAGGTCC 

55300 + + + -f + + 55359 

AGCGGCGCCTCTAGACCGCCGAGCGCCGCCTTACGGACTACGCCCGCGCGCCCTTCCAGG 
34 AAE1WRLAAECLMRARGKVR- 

GCGGCGCCACCGGCGGCTGACGCCGCGCCGGTTCGCGAGGTCCACCGCGCCGCCGTGGCC 
55360 + + + + + + 55419 
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CGCCGCGGTGGCCGCCGACTGCGGCGCGGCCAAGCGCTCCAGGTGGCGCGGCGGCACCGG 

34 G A T G G * (ORF34) 

ACCGCCGTCGGCGTGAGGCGCCGGCGTGTGCCGCCCCCCACGGTTGCTCGCCCTTGGTGG 

55420 + + + + + + 55479 

TGGCGGCAGCCGCACTCCGCGGCCGCACACGGCGGGGGGTGCCAACGAGCGGGAACCACC 

TGCATCTGTTGGCACATGTGTACCTCCTACACAGTCAATTGTTGCCAAAATTGTCGAACC 

55480 + -f + + + + 55539 

ACGTAGACAACCGTGTACACATGGAGGATGTGTCAGTTAACAACGGTTTTAACAGCTTGG 

GAATGGCAATTGCTTGCCTTTGCTGAAGAGGCGTGCTGATATGCAAGTCAAGTAGCCTCC 

55540 + + + + + + 55599 

CTTACCGTTAACGAACGGAAACGACTTCTCCGCACGACTATACGTTCAGTTCATCGGAGG 

TC CGAT CT CGGGCGGC CATATGGGAAACATCGAGTTGAGCGGCGATGGCGTTCGTCAGTG 

55600 + + + + 4- + 55659 

AGGCTAGAGCCCGCCGGTATACCCTTTGTAGCTCAACTCGCCGCTACCGCAAGCAGTCAC 

CTGCCGTTCTGGCCAGGCAACTGATGTCGATGGGGATGGCAAGATTTTGCCGAAAACCGA 

55660 + + + + + + 55719 

GACGGCAAGACCGGTCCGTTGACTACAGCTACCCCTACCGTTCTAAAACGGCTTTTGGCT 

TACATCTCTGTCCGTCCCGGACAGCCTTCGCCCCCCGGGTGACACTGCTCCGGCATGGCT 

55720 + + + + + + 55779 

ATGTAGAGACAGGCAGGGCCTGTCGGAAGCGGGGGGCCCACTGTGACGAGGCCGTACCGA 

CCGGTTTCTCGTCGCCCGGCCGACGGACCGCACCGTCCGGAACGAGGCGCCGGTGTGCGT 

55780 + + + + + + 55839 

GGCCAAAGAGCAGCGGGCCGGCTGCCTGGCGTGGCAGGCCTTGCTCCGCGGCCACACGCA 

CCGCTGATGGGCACAGCGGCCTCGGCCGCAGCAGGTTCCCACCGAGAAGAATGCCGAGGC 

55840 + + + + + + 55899 

GGCGACTACCCGTGTCGCCGGAGCCGGCGTCGTCCAAGGGTGGCTCTTCTTACGGCTCCG 

CCAGCCGTGAACCACGACATGTCCCAGCGTGCCTTGCTGGAGGCGGCGGCCGAGGGGCTG 

55900 + + + + + + 55959 

GGTCGGCACTTGGTGCTGTACAGGGTCGCACGGAACGACCTCCGCCGCCGGCTCCCCGAC 

CGGCGGCTGGCCGGCGACGCGCGGTGCCGGAGCGCGTCGGCCGCGCCCTCCTCGGCATTG 

55960 + + + + + + 56019 

GCCGCCGACCGGCCGCTGCGCGCCACGGCCTCGCGCAGCCGGCGCGGGAGGAGCCGTAAC 

AGGGACATGTTCTCCCCCGCCGCCCGCCGGTACGTGCTCGCCTCGGACCGCGCGGGGTTC 

56020 + + + + + + 56079 

TCCCTGTACAAGAGGGGGCGGCGGGCGGCCATGCACGAGCGGAGCCTGGCGCGCCCCAAG 

35 CORF35) MFSPAARRYVLASDRAGF 

TTCGAGCAGGCTGTCCGGCTGCGCTCCCGGGGGTACCGGGTGAGCGCGGAGTTCGTCGGC 

56080 + -f + + + + 56139 

AAGCTCGTCCGACAGGCCGACGCGAGGGCCCCCATGGCCCACTCGCGCCTCAAGCAGCCG 
35 FEQAVRLRSRGYRVSAEFVG 

CCCGATCAGGGAGCCACCGACGCCCTCCACGCGGAGCACGTGGTCGAAGAGCACCTGAGG 

56140 + + + + + + 56199 

GGGCTAGTCCCTCGGTGGCTGCGGGAGGTGCGCCTCGTGCACCAGCTTCTCGTGGACTCC 
35 PDQGATDALHAEHVVEEHLR 

CTGCTCGATCAGGAGCCGGCCCCTGACCGGATCGGTGTGGACGTCTCCCGGATCGGCCTC 

56200 + + + + + + 56259 

GACGAGCTAGTCCT CGGCCGGGGACTGGC CTAGCCACAC CTGCAGAGGGCCTAGCCGGAG 
35 LLDQEPAPDRIGVDVSRIGL 

GCCCACTCGGCGCAGACTGCCCTGCGCAACACCGGGCGGCTGGCTGCCGCTGCGGCGCTC 

56260 + + + + + + 56319 

CGGGTGAGCCGCGTCTGACGGGACGCGTTGTGGCCCGCCGACCGACGGCGACGCCGCGAG 
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CGCGGGAGCGAGGTCGTCCTGCTCATGGAGGGGTCCGAGGACATCGACACCGTGCTGGCC 

56320 + + + + + + 56379 

GCGCCCTCGCTCCAGCAGGACGAGTACCTCCCCAGGCTCCTGTAGCTGTGGCACGACCGG 

RGSEVVLLMEGSEDIDTVLA 

GTCCATGACGCCCTGGTGAACCGTTACGACAACGTGGGGATCACCCTTCAGGCGCACCTG 

56380 + + + + + + 56439 

CAGGTACTGCGGGACCACTTGGCAATGCTGTTGCACCCCTAGTGGGAAGTCCGCGTGGAC 

VHDALVNRYDNVG I TLQAHL 

CACCGCACCGTGGACGACGCCATGGCGGTCGCGGGTCCTGGCCGCACCGTGCGGCTGGTC 

56440 + + + + + + 56499 

GTGGCGTGGCACCTGCTGCGGTACCGCCAGCGCCCAGGACCGGCGTGGCACGCCGACCAG 



H R 



TVDDAMAVAGPGRTVRLV 



ATGGGCTCCTCGGCCGAGCCTGCCGGCACCGCTCTGTCCCGGGGCCCCGCTCTGGAGGAC 

56500 + + + + + + 56559 

TACCCGAGGAGCCGGCTCGGACGGCCGTGGCGAGACAGGGCCCCGGGGCGAGACCTCCTG 

MGSSAEPAGTALSRGPALED 

CGGTACCTTGACCTCGCGGAGCTTCTCGTGGACCGTGGCGTCCGGCTGAGTCTGGCCACT 

56560 + + + + + + 56619 

GCCATGGAACTGGAGCGCCTCGAAGAGCACCTGGCACCGCAGGCCGACTCAGACCGGTGA 

RYLDLAELLVDRGVRLSLAT 

CCGGACGCCGAGGTCCTGGCCGGGGCGCAGGAGCGTGGTCTGCTCGAACGCGTCCAGGAC 

56620 + + + + + + 56679 

GGGCTGCGGCTCCAGGACCGGCCCCGCGTCCTCGCACCAGACGAGCTTGCGCAGGTCCTG 
PDAEVLAGAQERGLLERVQD 

ATCGAGATGCTCTACGGTGTGCGGCCCGAGCTGCTGCGCCGCCACCGGGCGGCGGGCCGC 

56680 + + + + + + 56739 

TAGCTCTACGAGATGCCACACGCCGGGCTCGACGACGCGGCGGTGGCCCGCCGCCCGGCG 
IEMLYGVRPELLRRHRAAGR 

CCCTGTCGCATCCACGCGGCCTACGGGATGAACTGGTGGCTTCCCCTGCTGCGGAGGCTG 

56740 + + + + + + 56799 

GGGACAGCGTAGGTGCGCCGGATGCCCTACTTGACCACCGAAGGGGACGACGCCTCCGAC 

PCRIHAAYGMNWWLPLLRRL 

GCCGACAACCCGCCGATGGTGCTCAACGCCCTGGCCGACATCGGCCGGGACCGGGAGCCC 

56800 + + + + + + 56859 

CGGCTGTTGGGCGGCTACCACGAGTTGCGGGACCGGCTGTAGCCGGCCCTGGCCCTCGGG 
ADNPPMVLNALADIGRDREP 

GTCGCCCACCAGGCGTACTGACCCGCCCCGGGCCGCGATCCGCGGGGCACCGGCCCCGGG 

56860 + + + + + + 56919 

CAGCGGGTGGTCCGCATGACTGGGCGGGGCCCGGCGCTAGGCGCCCCGTGGCCGGGGCCC 

V A H Q A Y * (OF35) 

GCGCCGGTCAGCTCCCGGTCGCCGCGAACTGCCCGGGCCTGCGCCCCTCGCCCGCCGGCC 

56920 + + + + + + 56979 

CGCGGCCAGTCGAGGGCCAGCGGCGCTTGACGGGCCCGGACGCGGGGAGCGGGCGGCCGG 
(ORF36) * SGTAAFQGPRRGEGAP 

CCCGGTAGGCCTGGGCGATGTCCAGCCACTTCTCCGCCTCCTGACCAGACGCGGTCAGGG 

56980 + + + + + + "039 

GGGCCATCCGGACCCGCTACAGGTCGGTGAAGAGGCGGAGGACTGGTCTGCGCCAGTCCC 
; GRYAQAIDLWKEAEQGSATL 

CGAGGTCGTCGCGGTGGCGGCGCCGGGTGACCAGCAGGCAGAAGTCGTGCGCGGGACCGC 

57040 + + + + + + 57099 

GCTCCAGCAGCGCCACCGCCGCGGCCCACTGGTCGTCCGTCTTCAGCACGCGCCCTGGCG 
; ALDDRHRRRTVLLCFDHAPG 
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TGACCGTCTCGGTGGCGTCCTCGGGGCCGACCGTCCAGACCTCGCCCGAGGGGGCGGTGA 

57100 + + + + + + 57159 

ACTGGCAGAGCCACCGCAGGAGCCCCGGCTGGCAGGTCTGGAGCGGGCTCCCCCGCCACT 
36 SVTETADEPGVTWVEGS PAT 

GCTCGAAGCGGAACGGCGCGGCCGGCGGGGTCAGACCGTGGGACTCGTAGCCGAAGTCGC 

57160 + + + + + + 57219 

CGAGCTTCGCCTTGCCGCGCCGGCCGCCCCAGTCTGGCACCCTGAGCATCGGCTTCAGCG 
36 LEPRFPAAPPTLGHSEYGFD 

GTGTCAGCCAGGCGAAGTCGACGATGTTGCGAAGCCGCTCGGTGGGCGTGCGCCGGACAC 

57220 + + + + + + 57279 

CACAGTCGGTCCGCTTCAGCTGCTACAACGCTTCGGCGAGCCACCCGCACGCGGCCTGTG 
36 RTLWAFDVINRLRETPTRRV 

CCAGGGCGTCGGCGACGTCCTGGCCGTGGGCGAACACCTCCATGATCCCGGCGCAGCCCA 

57280 + + + + + + 57339 

GGTCCCGCAGCCGCTGCAGGACCGGCACCCGCTTGTGGAGGTACTAGGGCCGCGTCGGGT 
36 GLADAVDQGHAFVEMI GACG 

GAACGACCGGCGGCAGCGGGTTGACCAGCCACGGAACCACCTGGCCGGCGGGGACCGCGG 

57340 + + + + + + 57399 

CTTGCTGGCCGCCGTCGCCCAACTGGTCGGTGCCTTGGTGGACCGGCCGCCCCTGGCGCC 
36 LVVPPLPNVLWPVVQGAPVA 

CGAGCGCCTCGACCGAGGCCCGCCCCATGCCCCGGAAGCGGGTGAGCAGTTCCTGCGGCG 

57400 + + + + + + 57459 

GCTCGCGGAGCTGGCTCCGGGCGGGGTACGGGGCCTTCGCCCACTCGTCAAGGACGCCGC 
36 ALAEVSARGMGRFRTLLEQP 

GGAAGCCCTTGAACTGCTGCAGAGCCGCGTTGACCGCTCCGTCGAAGTTGCCTGCCGCGG 

57460 + + + + + + 57519 

CCTTCGGGAACTTGACGACGTCTCGGCGCAACTGGCGAGGCAGCTTCAACGGACGGCGCC 
36 PFGKFQQLAANVAGD FNGAA 

CGGCCGTGACGGCCTTGAACTCCTCCGGCGCCGCCGCCGCGGTCCTGGCCAGGTTGAAGA 

57520 + + + + + + 57579 

GCCGGCACTGCCGGAACTTGAGGAGGCCGCGGCGGCGGCGCCAGGACCGGTCCAACTTCT 
36 AATVAKFEE PAAAATRALNF 

CGAAGGTGAGGTGGGCGATCTGGTCGGTGACGGTCCAGCCGGGCGCCGGCGTCGGAGTGT 

57580 + + + + + + 57639 

GCTTCCACTCCACCCGCTAGACCAGCCACTGCCAGGTCGGCCCGCGGCCGCAGCCTCACA 
36 VFTLHAIQDTVTWGPAPTPT 

TCCAGGCTTCGTCGTCGATCTTCTCGACCAGCTGCGCCAGCTCCTCGATGTCGGTGGCCA 

57640 + + + + + + 57699 

AGGTCCGAAGCAGCAGCTAGAAGAGCTGGTCGACGCGGTCGAGGAGCTACAGCCACCGGT 
36 NWAEDDIKEVLQALEEIDTA 

GGTGCTTGAGGACGTCGTCGAGCGAATTCATCTCGTACTTCCTTCACTGGGGGTGTTCCG 

57700 + + + + + + 57759 

C CACG7VACTCCTGCAGCAGCTCGCTTAAGTAGAGCATGAAGGAAGTGACC CCCACAAGGC 
36 LHKLVDDLSNM (ORF36) 

GGCTGGGACGGATGTCCCGCCGGGTGGGCCGGCGGCCGGCGGAAGCGCCGTCGCGGAGCG 

57760 + + + + + + 57819 

CCGACCCTGCCTACAGGGCGGCCCACCCGGCCGCCGGCCGCCTTCGCGGCAGCGCCTCGC 

TCGGCGACAGTCGCTAGGCGGCGCGTCCCGCGTAGGAGCCGGCCCGGTCGGAATAGGGCG 

57820 + + + + + + 57879 

AGCCGCTGTCAGCGATCCGCCGCGCAGGGCGCATCCTCGGCCGGGCCAGCCTTATCCCGC 
37 (ORF37) *AARGAYSGARDSYP 

CGAGCGCCTCGGCCAGGGCTTCGGGTATCAGGGTCGGCACGGTCGCCGTGTTGGGGCCGC 
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57880 + + + + + + 57939 

GCTCGCGGAGCCGGTCCCGAAGCCCATAGTCCCAGCCGTGCCAGCGGCACAACCCCGGCG 
37 ALAEALAEPILTPVTATNPG 

GCATGCAGGCGATGCGCTGGCGTCCCCGCGCCACCAGGGTCTCGCCGCCGTCGTCGCCCA 

57940 + + + + + + 57999 

CGTACGTCCGCTACGCGACCGCAGGGGCGCGGTGGTCCCAGAGCGGCGGCAGCAGCGGGT 
37 RMCAIRQRGRAVLTEGGDDG 

GCTTGATGTAGTCGAAGGTGAACTCCAGCTGGGTCTGCCGCAGCTCCGAGAGCCTCATCC 

58000 + + + + + + 58059 

CGAACTACATCAGCTTCCACTTGAGGTCGACCCAGACGGCGTCGAGGCTCTCGGAGTAGG 
37 LKIYDFTFELQTQRLESLRM 

GGATCGACAGTTCGTCGAAGGCGGTGATCTCCGCGAAGAACTCGCAGTCCACCTTGAGGG 

58060 + + + + + + 58119 

CCTAGCTGTCAAGCAGCTTCCGCCACTAGAGGCGCTTCTTGAGCGTCAGGTGGAACTCCC 
37 RISLEDFATIEAFFECDVKL 

TGAAGAGCTTGAGGTCCTCCTGGACCTCGGCGAGCACCGAAGGCGCCCTCTCCTTGAGAA 

58120 + + + + + + 58179 

ACTTCTCGAACTCCAGGAGGACCTGGAGCCGCTCGTGGCTTCCGCGGGAGAGGAACTCTT 
37 TFLKLDEQVEALVSPAREKL 

AGAGTTCCCGGCAACGCCCCTGCCAACGAAGGTAGTTGACGTAGTAGACGTTGCCGACGA 

58180 + + + + + + 58239 

TCTCAAGGGCCGTTGCGGGGACGGTTGCTTCCATCAACTGCATCATCTGCAACGGCTGCT 
37 FLERCRGQWRLYNVYYVNGV 

GGTTCGTCTCCTCGAAGCCGACGGTGTGGCGGAGCTCGAAGTAGTCAGGATTCGTCGCGG 

58240 + + --- + + + + 58299 

CCAAGCAGAGGAGCTTCGGCTGCCACACCGCCTCGAGCTTCATCAGTCCTAAGCAGCGCC 
37 LNTEEFGVTHRLEFYDPNTA 

TCATAGGTCTGTGCCCTTCGTCGTCGGGGCCGGTCGTCGCACCGAGTTGCGTGAAGCAAC 

58300 + + + + + + 58359 

AGTATCCAGACACGGGAAGCAGCAGCCCCGGCCAGCAGCGTGGCTCAACGCACTTCGTTG 

37 T M (ORF37) 

TCACTGGTCGCGATGGCCTGCGGGGTCGGTGGCCCGCGCTCCGGGCGGAGAGTGCGGGCG 

58360 + + + + + + 58419 

AGTGACCAGCGCTACCGGACGCCCCAGCCACCGGGCGCGAGGCCCGCCTCTCACGCCCGC 

GGGTGCCGGCCGGCGCGGGGTCAGCCGCGCGCCGACGGCAGCAGGGGAAGAACCCTCTCG 

58420 + + + + + + 58479 

CCCACGGCCGGCCGCGCCCCAGTCGGCGCGCGGCTGCCGTCGTCCCCTTCTTGGGAGAGC 

38 (ORF38) *GRAS PLLPLVRE 

CGGCCGCTCGTGGAGCCGTCGGGGGCCGGTGCGCCGTAGGTGACGGAGATACCCCGGCTC 

58480 + + + + + + 58539 

GCCGGCGAGCACCTCGGCAGCCCCCGGCCACGCGGCATCCACTGCCTCTATGGGGCCGAG 
38 RGSTSGDPAPAGYTVS I GRS 

TGCGCGGCGCGCACGATCCCCGGCATCGCGCGTTCGGCGAGCGCCGCGATGGTCATCGCG 

58540 + + + + + + 58599 

ACGCGCCGCGCGTGCTAGGGGCCGTAGCGCGCAAGCCGCTCGCGGCGCTACCAGTAGCGC 
38 QAARVIGPMAREALAAITMA- 

GGATTGACCGTCAGCGCGCCGGGAACCGACGATCCGTCGGTGACGAAGATCCCCGGGTGG 

58600 + + + + + + 58659 

CCTAACTGGCAGTCGCGCGGCCCTTGGCTGCTAGGCAGCCACTGCTTCTAGGGGCCCACC 
38 PNVTLAGPVSSGDTVFIGPH 

TCGCGGAGCTCGTTGCTGTCGTCCAGGGCGGATGTGTGGGGGTCGTCGCCCATCCGGCAG 

58660 + + + + + + 58719 

AGCGCCTCGAGCAACGACAGCAGGTCCCGCCTACACACCCCCAGCAGCGGGTAGGCCGTC 
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38 DRLENSDDLASTHPDDGMRC 

GAGGAGAGCGGGTGGACGGTGTAGGCGCCGACGAGGTCGTTGGTCCAGGGCATGACCTTG 

58720 + + + + + + 58779 

CTCCTCTCGCCCACCTGCCACATCCGCGGCTGCTCCAGCAACCAGGTCCCGTACTGGAAC 
38 SSLPHVTYAGVLDNTWPMVK- 

GCCAGGCCGTCCTTCTCCAGGATCTCCTTGACCTCGGCGTCGGATGCGGCCCAGGCGCCC 

58780 + + + + + + 58839 

CGGTCCGGCAGGAAGAGGTCCTAGAGGAACTGGAGCCGCAGCCTACGCCGGGTCCGCGGG 
38 ALGDKEL I EKVEADSAAWAG 

AGGGTGTTCTTCGTCGGGTCGTAGCGCAGGTTGCCCCGGCCGAGCATCTGCTGGGAGATG 

58840 + + + + + + 58899 

TCCCACAAGAAGCAGCCCAGCATCGCGTCCAACGGGGCCGGCTCGTAGACGACCCTCTAC 
38 LTNKTPDYRLNGRGLMQQS I 

CGGTGGGCGTTACCGGTGGCGGGAGGGGGGCCGAAGACGCCTTCGTTGTCGTCCTCGATC 

58900 + + + + + + 58959 

GCCACCCGCAATGGCCACCGCCCTCCCCCCGGCTTCTGCGGAAGCAACAGCAGGAGCTAG 
38 RHANGTAPPPGFVGENDDEI- 

ATCGTGAAGATCGTGAGCCAGGAGGTCCACTGCTTCAGGATCTCCTTCTTCTCCTTGCCG 

58960 + + + + + + 59019 

TAGCACTTCTAGCACT CGGTCCTC CAGGTGACGAAGTCCTAGAGGAAGAAGAGGAACGGC 
38 MTFITLWSTWQKLIEKKEKG- 

AACCAGGAGGGGCCCGTGGCGCCGGGCACCTGGGCGAGGATCGTGCCGAGGCCCGGCGGG 

59020 + + + + + + 59079 

TTGGTCCTCCCCGGGCACCGCGGCCCGTGGACCCGCTCCTAGCACGGCTCCGGGCCGCCC 
38 FWSPGTAGPVQALITGLGPP 

AAGTAGAGCTGTTCCAGGGAGTAGCGGGAGTACTCGGGCAACGAGCCGTCCAGCCTGTCC 

59080 + + + + + + 59139 

TTCATCTCGACAAGGTCCCTCATCGCCCTCATGAGCCCGTTGCTCGGCAGGTCGGACAGG 
38 FYLQELSYRSYEPLSGDLRD- 

CAGCTCGCCACGGTGGGCCCCTTGCCGATCTGGTTGGCCGCGTAGGCGAGCCCGTCGCCC 

59140 + + + + + + 59199 

GTCGAGCGGTGCCACCCGGGGAACGGCTAGACCAACCGGCGCATCCGCTCGGGCAGCGGG 
38 WSAVTPGKGIQNAAYALGDG- 

CGGTCCAGGCCGAACAGCTCGGCCGCCTTGGCCTCGTCGATGATGGCGGTGTTGAGCCGC 

59200 + + + + + + 59259 

GCCAGGTCCGGCTTGTCGAGCCGGCGGAACCGGAGCAGCTACTACCGCCACAACTCGGCG 
38 RDLGFLEAAKAEDI IATNLR- 

TCGCCGTTGCCGGAGAAGTAGCGTCCGACCGCTCGTGGCATGGTGCCCAGGTGGGCCTCG 

59260 + + + + + + 59319 

AGCGGCAACGGCCTCTTCATCGCAGGCTGGCGAGCACCGTACCACGGGTCCACCCGGAGC 
38 EGNGS FYRGVARPMTGLHAE 

CTGCGCTGGAGGATCACCGGGGTCGCGCCCGCGCCGGCCGCCATCACCACGATCTTCGCC 

59320 + + + + + + 59379 

GACGCGACCTCCTAGTGGCCCCAGCGCGGGCGCGGCCGGCGGTAGTGGTGCTAGAAGCGG 
38 SRQL IVPTAGAGAAMVVI KA- 

TCGATGACGCCGCTGCCCGCCTGGAGGCGGTAGTCGTCGTCGTGCACGACGTTGTAGTGC 

59380 + + + + + + 59439 

AGCTACTGCGGCGACGGGCGGACCTCCGCCATCAGCAGCAGCACGTGCTGCAACATCACG 
38 E IVGS GAQLRYDDDHVVNYH 

ACCCGGTAGGAGCCGTCGGGGGTGCGCGAGAGGTGCTGGACCTCGTGCAGCGGGCGGATG 

59440 + + + + + + 59499 

TGGGCCATCCTCGGCAGCCCCCACGCGCTCTCCACGACCTGGAGCACGTCGCCCGCCTAC 
38 VRYSGDPTRSLHQVEHLPRI- 
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CGCGCCCCATGGGCGATGGCGGCGGGCAGGTAGTTGACCAGCAAGGACTGCTTGGCCTCG 

59500 + + + + + + 59559 

GCGCGGGGTACCCGCTACCGCCGCCCGTCCATCAACTGGTCGTTCCTGACGAACCGGAGC 
38 RAGHAIAAPLYNVLLSQKAE 

AAGCGGCAGCCGGCCATCATCCAGTTGCAGTTCACGCACTTGGTGTTGTCGATGGCGACG 

59560 + + + + + + 59619 

TTCGCCGTCGGCCGGTAGTAGGTCAACGTCAAGTGCGTGAACCACAACAGCTACCGCTGC 
38 FRCGAMMWNCNVCKTNDIAV- 

GCGAGGGGGTTGGCGGTGCGGCCGGCGTGGTTGCACGCCGCGGCCCACAGTCCGCCGGCG 

59620 + + + + + + 59679 

CGCTCCCCCAACCGCCACGCCGGCCGCACCAACGTGCGGCGCCGGGTGTCAGGCGGCCGC 
38 ALPNATRGAHNCAAAWLGGA- 

TAGCTCACGTCGTTCCAGTCCTGCCGGGTCACGGAGAGGGACTCCTCGACACGGTCGTAC 

59680 + + + + + + 59739 

ATCGAGTGCAGCAAGGTCAGGACGGCCCAGTGCCTCTCCCTGAGGAGCTGTGCCAGCATG 
38 YSVDNWDQRTVSLSEEVRDY- 

CAGGGGTCCAGGGTTTCGCGGCTCACCGCCTGCGGCCACATCCGGCGTCCTATGGACCCC 

59740 + + + + + + 59799 

GTCCCCAGGTCCCAAAGCGCCGAGTGGCGGACGCCGGTGTAGGCCGCAGGATACCTGGGG 
38 WPDLTERSVAQPWMRRGI SG 

TGCCGGTCGAAGACGAAGCGCGGGGCGCGGGGCATCGCGGCGAAGTAGACGACGCTGCCG 

59800 + + + + + + 59859 

ACGGCCAGCTTCTGCTTCGCGCCCCGCGCCCCGTAGCGCCGCTTCATCTGCTGCGACGGC 
38 QRDFVFRPARPMAAFYVVSG- 

CCGCCCACACAGTTCCCGCCGAGGATGCTCATGCCGTCCCCGACCGTGAAGTCGAACGCC 

59860 + + + + + + 59919 

GGCGGGTGTGTCAAGGGCGGCTCCTACGAGTACGGCAGGGGCTGGCACTTCAGCTTGCGG 
38 GGVCNGGLI SMGDGVTFDFA- 

CTCGTGTACGAGGAGCCGAGTTTGTAGTCGTGCTCGAACTCCTTGCTCTCCAGCCACGGC 

59920 + + + + + + 59979 

GAGCACATGCTCCTCGGCTCAAACATCAGCACGAGCTTGAGGAACGAGAGGTCGGTGCCG 
38 RTYSSGLKYDHEFEKSELWP 

CCGCGTTCCAGGACGGTGACGTCGGCGCCCCCCGCCGCCAGGTGGTAGGCGGCGATGGCA 

59980 + + + + + + 60039 

GGCGCAAGGTCCTGCCACTGCAGCCGCGGGGGGCGGCGGTCCACCATCCGCCGCTACCGT 
38 GRELVTVDAGGAALHYAAIA- 

CCGCCGAATCCGCTGCCGATGACGAGGACGTCCGTGCGCTCGGCCGTGGTGCTCATGCGG 

60040 + + + + + + 60099 

GGCGGCTTAGGCGACGGCTACTGCTCCTGCAGGCACGCGAGCCGGCACCACGAGTACGCC 

38 GGFGSGIVLVDTREATTSM 

(ORF3 9) * A 

GGCTCCCGGTGGACGTGGTGTCGGGGTGGAGGCGGGCGAACTCACGCCCGTAGCTGTAAT 

60100 + + + + + + 60159 

CCGAGGGCCACCTGCACCACAGCCCCACCTCCGCCCGCTTGAGTGCGGGCATCGACATTA 

39 PSGTSTTDPHLRAFERGYSY 

CCTTGAAGCGCCACAGGCCGTCGGCGTCCGGCATGCTCAGGCCCATGGCCTCCAGTCCCG 

60160 + + + + + + 60219 

GGAACTTCGCGGTGTCCGGCAGCCGCAGGCCGTACGAGTCCGGGTACCGGAGGTCAGGGC 
39 DKFRWLGDADPMSLGMAELG 

GATGGCCGTCCTCCATCGCCTGTGCCGTGTTGAGGTGCGCGGCCGAATCGAAGGCCATGT 

60220 + + + + + + 60279 

CTACCGGCAGGAGGTAGCGGACACGGCACAACTCCACGCGCCGGCTTAGCTTCCGGTACA 
39 PHGDEMAQATNLHAASDFAM 
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TGCAGAAGAGGGACAGCAGCACCCAGAACTCCTTCTCGGGGTGGCCTGGTGTCGTCAGCC 

60280 + + + + + + 60339 

ACGTCTTCTCCCTGTCGTCGTGGGTCTTGAGGAAGAGCCCCACCGGACCACAGCAGTCGG 
39 NCFLSIrLVWFEKEPHGPTTL 

GCTGGATCAGCGCGGCCCGGTCCGGGTAGTCGAGCGCCACGAAGGGCGGGACCGTCGGGT 

60340 + -f + + + + 60399 

CGACCTAGTCGCGCCGGGCCAGGCCCATCAGCTCGCGGTGCTTCCCGCCCTGGCAGCCCA 
39 RQILAARDPYDLAVFPPVTP 

CGGGAGCCAGGCGGCGCTCCGCCGCGTAGGCCAGCGCGTGCTCGTTCACCAGGCGCACCA 

60400 + + + + + + 60459 

GCCCTCGGTCCGCCGCGAGGCGGCGCATCCGGTCGCGCACGAGCAAGTGGTCCGCGTGGT 
39 DPALRREAAYALAHENVLRV 

GGTCGTCCAGACCCTCGTGGATGCCGGTCGCATCCCATTGCAGGAGCTCCAGGGCTCCCG 

60460 + + -f + + + 60519 

CCAGCAGGTCTGGGAGCACCTACGGCCAGCGTAGGGTAACGTCCTCGAGGTCCCGAGGGC 
39 LDDLGEHIGTADWQLLELAG 

CCTGGACGGCGCCACCGCCGGTGGACACCCCCGCGATGGCCCGGTCGTCCGCGAAGCGCT 

60520 + + + + + + 60579 

GGACCTGCCGCGGTGGCGGCCACCTGTGGGGGCGCTACCGGGCCAGCAGGCGCTTCGCGA 
39 AQVAGGGTSVGAIARDDAFR 

TCTGGCCCGGCACGATCGTGTCCGCGTAGGCCTCCAGGGTCATGGTCCGGATATCGCCGG 

60580 + + + + + + 60639 

AGACCGGGCCGTGCTAGCACAGGCGCATCCGGAGGTCCCAGTACCAGGCCTATAGCGGCC 

39 KQGPVITDAYAELTM (ORF3 9) 

CCGGCGCCCCTCGCTCATTGTCGTCGCGCAACTCGCTCTCCATTCTCGCAGTCCGGAGTG 

60640 + + + + + + 60699 

GGCCGCGGGGAGCGAGTAACAGCAGCGCGTTGAGCGAGAGGTAAGAGCGTCAGGCCTCAC 

GGATGCCTTGTGGCGAGGAGAAAGCTAGGTTCGTTCGACCGGTTCAAGCAACTAGCCAAA 

60700 + + + + + + 60759 

CCTACGGAACACCGCTCCTCTTTCGATCCAAGCAAGCTGGCCAAGTTCGTTGATCGGTTT 

GTCGAGGCGACCTTGAAACCGACTCCACGGAGTTGGCGCGAAGCGGCGGATGGATTACAC 

60760 + + + + + + 60819 

CAGCTCCGCTGGAACTTTGGCTGAGGTGCCTCAACCGCGCTTCGCCGCCTACCTAATGTG 

GCGCGGGCGAGCGGCTCACTAGTCTGGCCGCACGGATGTCTTCATCACCTGCACGTGGAA 

60820 + + + + + + 60879 

CGCGCCCGCTCGCCGAGTGATCAGACCGGCGTGCCTACAGAAGTAGTGGACGTGCACCTT 

AAGCTTCTGCACGGGCACCGCATGTGGAAGTGAGCCCTGGTCTCATGTCTTGGGGGAAAC 

60880 + + + + + + 60939 

TTCGAAGACGTGCCCGTGGCGTACACCTTCACTCGGGACCAGAGTACAGAACCCCCTTTG 

GTGAAAAGTGACTCTGCCCAACGCGCCGTGGAGCGATCACGCCGTGTCGTACGGATCGAT 

60940 + + + + + + 60999 

CACTTTTCACTGAGACGGGTTGCGCGGCACCTCGCTAGTGCGGCACAGCATGCCTAGCTA 

40 VKSDSAQRAVERSRRVVRID 

(ORF40) 

GAACTCATTCCCGCCGATTCCCCGCGCCTGAACGGAATCGATCGTTCCCATGTGCAGCGC 

61000 + + + + + + 61059 

CTTGAGTAAGGGCGGCTAAGGGGCGCGGACTTGCCTTAGCTAGCAAGGGTACACGTCGCG 
40 ELI PADS PRLNGIDRSHVQR 

CTCGCGACCGTGTACGCGTCCCTGCCGCCGGTCCTGGTGCACCGCCCGACCATGCGGGTC 

61060 + + + + + + 61119 

GAGCGCTGGCACATGCGCAGGGACGGCGGCCAGGACCACGTGGCGGGCTGGTACGCCCAG 
40 LATVYASLPPVLVHRPTMRV 
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GTCGACGGCATGCACCGCATCGGCGCGGCCCGCCTGAAGGGGCTGGACACGGTCGAGGTC 

61120 + + + + + + 61179 

CAGCTGCCGTACGTGGCGTAGCCGCGCCGGGCGGACTTCCCCGACCTGTGCCAGCTCCAG 
40 VDGMHRIGAARLKGLDTVEV 

ACCTTCTTCGAGGGCGCCGAGGAGCAGGTGTTCCTGCGTTCCGTCGCGGCGAACATCACC 

61180 + + + + + + 61239 

TGGAAGAAGCTCCCGCGGCTCCTCGTCCACAAGGACGCAAGGCAGCGCCGCTTGTAGTGG 
40 TFFEGAEEQVFLRSVAANIT 

AACGGCCTGCCGTTGTCGGTGGCCGACCGCAAGACCGCCGCGGCCCGCATTCTGGCCTCC 

61240 + + + + + + 61299 

TTGCCGGACGGCAACAGCCACCGGCTGGCGTTCTGGCGGCGCCGGGCGTAAGACCGGAGG 
40 NGLPLSVADRKTAAARI LAS 

CACCCGACCCTGTCCGACCGCGCGGTCGCCGCACACGTCGGCCTCGACGCCAAGACCGTG 

61300 + + + + + + 61359 

GTGGGCTGGGACAGGCTGGCGCGCCAGCGGCGTGTGCAGCCGGAGCTGCGGTTCTGGCAC 
40 HPTLSDRAVAAHVGLDAKTV 

GCGGGGGTACGGACGTGTTCAGCCGCGGGTTCTCCGCTGCTGAACATGCGCACCGGGGCG 

61360 + + + + + + 61419 

CGCCCCCATGCCTGCACAAGTCGGCGCCCAAGAGGCGACGACTTGTACGCGTGGCCCCGC 
40 AGVRTCSAAGS PLLNMRTGA 

GACGGCCGCGTCCACCCGTTGGACCGCACCGCCGAACGCCTGCACGCGGCCGCGCTGCTG 

61420 + + + + + + 61479 

CTGCCGGCGCAGGTGGGCAACCTGGCGTGGCGGCTTGCGGACGTGCGCCGGCGCGACGAC 
40 DGRVHPLDRTAERLHAAALL 

ACCCAGGACCCGGGACTCCCGTTGCGCTCCGTCGTCGAGCAGACGGGGCTGTCGCTGGGC 

61480 + + + + + + 61539 

TGGGTCCTGGGCCCTGAGGGCAACGCGAGGCAGCAGCTCGTCTGCCCCGACAGCGACCCG 
40 TQDPGLPLRSVVEQTGLSLG 

ACGGCCCACGACGTCCGCCGTCGGCTGCTGCGGGGCGAGGACCCGGTCCCGCAGAACCGG 

61540 + + + + + + 61599 

TGCCGGGTGCTGCAGGCGGCAGCCGACGACGCCCCGCTCCTGGGCCAGGGCGTCTTGGCC 
40 TAHDVRRRLLRGED PVPQNR 

CAGAGCGCGATGCTGGAGCCGGGACTCGCCCCGCAGAAGAAGGCGACGGCCAAGCCGCCC 

61600 + + + + + + 61659 

GTCTCGCGCTACGACCTCGGCCCTGAGCGGGGCGTCTTCTTCCGCTGCCGGTTCGGCGGG 
40 QSAMLEPGLAPQKKATAKPP 

GTCGGCCCGGCCGCCCGTCCGGTCCCGAAGGTGCCGCCCGCCGTCGCCGGCAGGCCGCCG 

61660 + + + + + + 61719 

CAGCCGGGCCGGCGGGCAGGCCAGGGCTTCCACGGCGGGCGGCAGCGGCCGTCCGGCGGC 
40 VGPAARPVPKVPPAVAGRPP 

GTGTCACCGCGGTCCCGGGCCCCGCTGGAGGCGCTGCGCAAGCTCTCCAACGACCCCTCC 

61720 + + + + + + 61779 

CACAGTGGCGCCAGGGCCCGGGGCGACCTCCGCGACGCGTTCGAGAGGTTGCTGGGGAGG 
40 VSPRSRAPLEALRKLSNDPS 

CTGCGCCACTCCGACCAGGGGCGCGAACTCATGCGCTGGCTGCACAACCGGTTCGTCGTC 

61780 + + + + + + 61839 

GACGCGGTGAGGCTGGTCCCCGCGCTTGAGTACGCGACCGACGTGTTGGCCAAGCAGCAG 
40 LRHSDQGRELMRWLHNRFVV 

GACGAGGCGTGGCGCCGGCGCGCGGACGCGGTCCCGGCCCACTGCGTCGACTCGATGGCG 

61840 + + + + + + 61899 

CTGCTCCGCACCGCGGCCGCGCGCCTGCGCCAGGGCCGGGTGACGCAGCTGAGCTACCGC 
40 DEAWRRRADAVPAHCVDSMA 
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GAGCTGGCGCAGCACTGCTCGGACGCCTGGCACCGGTTCGCCGAGGAGATGGTTCGGCGC 

61900 + + + + + + 61959 

CTCGACCGCGTCGTGACGAGCCTGCGGACCGTGGCCAAGCGGCTCCTCTACCAAGCCGCG 
40 ELAQHCSDAWHRFAEEMVRR 

CGGCACAGCGCCGCGGCCGACGGCTCCGGACTCCGCACGACTCAGCCAACTCGCCGTTGA 

61960 + + + + + + 62019 

GCCGTGTCGCGGCGCCGGCTGCCGAGGCCTGAGGCGTGCTGAGTCGGTTGAGCGGCAACT 

40 RHSAAADGSGLRTTQPTRR* 
(ORF40) - 

CGGCCTACTTCGACAGGGAGTTACGGTGACCACGAACACCATCGAGGACGCGGTCCGCCG 

62020 + + + + + + 62079 

GCCGGATGAAGCTGTCCCTCAATGCCACTGGTGCTTGTGGTAGCTCCTGCGCCAGGCGGC 
41 (ORF41) VTTNT I EDAVRR 

GGTCGTCGAGTACATGCACGTCAACCTGGGTCAGAACCTCACGATCGATGACATGGCGCG 

62080 + + + + + + 62139 

CCAGCAGCTCATGTACGTGCAGTTGGACCCAGTCTTGGAGTGCTAGCTACTGTACCGCGC 
41 VVEYMHVNLGQNLT I DDMAR 

CACGGCGATGTTCAGCAAGTTCCATTTCACCCGCATCTTCCGCGAAGTCACCGGTACCTC 

62140 + + + + + + 62199 

GTGCCGCTACAAGTCGTTCAAGGTAAAGTGGGCGTAGAAGGCGCTTCAGTGGCCATGGAG 

41 TAMFSKFHFTRI FREVTGTS 

TCCCGGGCGTTTCCTGTCCGCCTTACGGATTCAGGAGGCCAAGAGACTTCTCGTGCACAC 

62200 + + + + + + 62259 

AGGGCCCGCAAAGGACAGGCGGAATGCCTAAGTCCTCCGGTTCTCTGAAGAGCACGTGTG 
41 PGRFLSALRI QEAKRLLVHT 

TGCACTCAGTGTGGCCGATATCAGCAGTCAGGTCGGCTACAGCAGTGTCGGTACTTTCAG 

62260 + + + + + + 62319 

ACGTGAGTCACACCGGCTATAGTCGTCAGTCCAGCCGATGTCGTCACAGCCATGAAAGTC 
41 ALSVADISSQVGYSSVGTFS- 

TTCTCGCTTCAAGGCCTGTGTGGGGCTTTCCCCGAGCGCCTATCGCGACTTCGGCGGGGT 

62320 + + + + + + 62379 

AAGAGCGAAGTTCCGGACACACCCCGAAAGGGGCTCGCGGATAGCGCTGAAGCCGCCCCA 
41 SRFKACVGLSPSAYRDFGGV- 

GCAGCCGGGTTTTCCCTCCGCCGCGGCCCGTCTCACTCCCACCGCGCACAATCCCTCCGT 

62380 + + + + + + 62439 

CGTCGGCCCAAAAGGGAGGCGGCGCCGGGCAGAGTGAGGGTGGCGCGTGTTAGGGAGGCA 
41 QPGFPSAAARLTPTAHNPSV- 

GCGCGGCCGCATTCACTCCGCCCCGGGTGACAGGCCCGGAAGGATCTTCGTGGGCCTGTT 

62440 + + + + + + 62499 

CGCGCCGGCGTAAGTGAGGCGGGGCCCACTGTCCGGGCCTTCCTAGAAGCACCCGGACAA 
41 RGRIHSAPGDRPGRI FVGLF- 

CCCCGGCAGGATGCGCCAGGGCCGCCCGGCGCGCTGGACCGTCATGGAGAGTCCCGGGGC 

62500 + + + + + + 62559 

GGGGCCGTCCTACGCGGTCCCGGCGGGCCGCGCGACCTGGCAGTACCTCTCAGGGCCCCG 
41 PGRMRQGRPARWTVMES PGA- 

CTTCGAGCTCCGGGACGTGCCCGTGGGCACCTGGCACATCCTGGTCCACTCCTTCCCCGC 

62560 + + + + + + 62619 

GAAGCTCGAGGCCCTGCACGGGCAC CCGTGGACCGTGTAGGAC CAGGTGAGGAAGGGGCG 
41 FELRDVPVGTWHILVHSFPA- 

CGGACACCGGCCGCACCAGCTCGACTCCGAACCGCTGTTGCTCGGGCACAGCGGACCGCT 

62620 + + + + + + 62679 

GCCTGTGGCCGGCGTGGTCGAGCTGAGGCTTGGCGACAACGAGCCCGTGTCGCCTGGCGA 
41 GHRPHQLDSEPLLLGHSGPL- 
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CGTGGTGCACCCCGGTGCCCTGCTCCGGCCGGCGGACATCCTCCTGCGCGCGGTGGACGC 

62680 + + + + + + 62739 

GCACCACGTGGGGCCACGGGACGAGGCCGGCCGCCTGTAGGAGGACGCGCGCCACCTGCG 
41 VVHPGALLRPADI LLRAVDA- 

CCTCGATCCACCGGTCCTGCTGGCCCACTTCGCGCTGGAGAGCCGCCTCACCTCGCCGTA 

62740 + + + + + + "799 

GGAGCTAGGTGGCCAGGACGACCGGGTGAAGCGCGACCTCTCGGCGGAGTGGAGCGGCAT 

41 LDPPVLLAHFALESRLTSPY- 

42 (0RF42) * R A T - 

CTCACCGTCATCGGTAGCCCTCCGCGCATCCGCAGGGAGAGCATGGGTTCGGCAACCGCC 

62800 + + + + + + 62859 

GAGTGGCAGTAGCCATCGGGAGGCGCGTAGGCGTCCCTCTCGTACCCAAGCCGTTGGCGG 

41 SPSSVALRASAGRAWVRQPP- 

42 SVTMPLGGRMRLSLMPEAVA- 

CGGTGTCCGGCGACGGTACGCAGATCGAGATCGCGGGTGACCAGGGCCGTGACGAACACC 

62860 + + + + + + 62919 

GCCACAGGCCGCTGCCATGCGTCTAGCTCTAGCGCCCACTGGTCCCGGCACTGCTTGTGG 

41 GVRRRYADRDRG* (ORF41) 

42 RHGAVTRLDLDRTVLATVFV- 

GCCTCCATCATCCCGAGGTTGCTGCCGACGCAGAACCGGGGCCCCGCGCCGAACGGGATG 

62920 + + + + + + 62979 

CGGAGGTAGTAGGGCTCCAACGACGGCTGCGTCTTGGCCCCGGGGCGCGGCTTGCCCTAC 

42 AEMMGLNSGVCFRPGAGFPI- 

TACGCGTACCGCGGCCGGTCGGCGGTCTGCCGGGGTTCGAACCGCTCGGGGTCGAAGCGC 

62980 + + + + + + 63039 

ATGCGCATGGCGCCGGCCAGCCGCCAGACGGCCCCAAGCTTGGCGAGCCCCAGCTTCGCG 

42 YAYRPRDATQRPEFREPDFR- 

TCGGGGTCCTCCCACAGCCCCGGATGGCGGTGCATGATGTACGGGCAGACCAGCACATCC 

63040 + + + + + + 63099 

AGCCCCAGGAGGGTGTCGGGGCCTACCGCCACGTACTACATGCCCGTCTGGTCGTGTAGG 

42 EPDEWLGPHRHMIYPCVLVD- 
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GATCCGGCGGACACCGTGTAGCCGCCGACCACATCGCGTTGCTGGGCCACCCTGGGCAGG 

+ + + + + + 

CTAGGCCGCCTGTGGCACATCGGCGGCTGGTGTAGCGCAACGACCCGGTGGGACCCGTCC 

SGASVTYGGVVDRQQAVRPL 



ATCCC 

63160 + 63164 

TAGGG 
42 I G - 
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